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INTRODUCTION 



I . 

The design of very large scale integrated (VLSI) circuits 
is a very time consuming process. To reduce the time and 
cost required to design VLSI circuits various silicon 
compilers have been developed [Ref. 1]. One of these 
compilers, the MacPitts silicon compiler, was developed at 
MIT Lincoln Laboratory in 1981-1982 [Refs. 2 and 3]. 

The MacPitts silicon compiler is a large and complex 
computer program that frees the circuit designer from having 
to worry about the details of the actual design and layout 
of the circuit. From a short program (usually less than 
fifty lines) that contains a functional description of the 
desired circuit, MacPitts completely designs an implementa- 
tion of the VLSI chip and outputs a file in Caltech 
Intermediate Form (CIF) that describes the circuit. The CIF 
file can be used to perform a functional simulation or a 
timing analysis of the circuit. After verifying the 
functional correctness of the circuit the CIF file can be 
sent to a silicon foundry so that the circuit can be 
fabricated . 

The MacPitts compiler has been used previously at the 
Naval Postgraduate School by Carlson [Ref. 4] to design a 
pipeline multiplier circuit. Carlson's thesis contains a 



11 



description of the MacPitts language, which is used to write 
the .mac program that contains a functional description of 
the circuit to be designed by the compiler. Also, a 
detailed procedure on how to write the .mac program is 
given. Carlson also shows how to use the MacPitts inter- 
preter to test the functional correctness of the .mac program 
before the circuit design is performed by the compiler. In 
addition, Carlson's thesis gives a detailed listing of the 
activities in the MacPitts design cycle used to design VLSI 
circuits. The design cycle includes generating the .mac 
program, submitting the .mac program to the compiler for 
circuit design and performing a design rule check and 
functional event simulation on the designed circuit. 

Since a good understanding of how to use the MacPitts 
silicon compiler to design VLSI circuits was obtained by 
Carlson [Ref. 4] it was decided that the next logical step 
was to learn more about the MacPitts architecture and to 
make some performance comparisons between various MacPitts 
and hand-crafted designs. The first goal of this thesis 
research was to determine what basic building blocks the 
compiler used to design VLSI circuits and how these building 
blocks are used to implement different circuits. Also, an 
understanding of how the statements in the .mac program 
determine the structure of the MacPitts designed circuit was 
desired . 
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It was decided to use the MacPitts compiler to design 
various adder circuits so that performance (chip size, power 
and speed) comparisons could be made between the MacPitts 
designs and a hand-crafted pipeline adder circuit designed 
by Conradi and Hauenstein [Ref. 5]. Crystal, the VLSI 
timing analysis program developed at the University of 
California at Berkeley [Ref. 6], was used to analyze the 
timing requirements of all circuits being compared. Since 
Crystal had never been used at the Naval Postgraduate School 
before, a procedure on how to use Crystal to analyze MacPitts 
designs had to be developed. This required adapting the 
basic Crystal Commands to the unconventional MacPitts 
three-phase overlapping clock scheme. 

The third research goal was to obtain a more complete 
understanding and description of the MacPitts interpreter 
than currently available in the literature. Reference 2 
and reference 4 describe how to use the interpreter, but a 
detailed description of the interpreter commands and error 
statements and its capabilities and limitations is not 
available. 

Chapter II of this thesis describes the basic circuit 
building blocks of the MacPitts compiler. The design of 
several combinational and pipeline adder circuits is 
presented in Chapter III. Chapter IV lists performance 
comparisons between the MacPitts adders and the Conradi and 
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Hauenstein adder [Ref. 5] along with, a tutorial on Crystal. 
Design errors that have been found in MacPitts designs are 
detailed in Chapter V. Tutorial material on the MacPitts 
interpreter is found in Appendix A. 
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THE MACPITTS DESIGN ARCHITECTURE 



II . 

A. INTRODUCTION 

The MacPitts design structure consists of five main 
components. They are the chip design frame with pads, the 
data-path, the sequencer , the Weinberger array and the flags 
block (see Figure 2.1). Input ports or signals are used to 
bring input data into the chip and output ports or signals 
are used to output data from the chip. The difference 
between ports and signals is that a port has as many bits as 
the data word defined by the programmer in the MacPitts .mac 
program and a signal is only a one-bit data element. 

B. THE DESIGN FRAME 

“ The MacPitts compiler was designed to have no limit on 
the size of a circuit that it would design although large 
circuits may take several days of computer time to be 
completed. The design constraints that must be used for 
practical designs are the MOSIS chip size and pad number 
fabrication limitations. The current MOSIS limitation for 
the chip size is 7900 x 9200 microns and the maximum number 
of pads is 84. 

All pads are defined in the "def" section of the MacPitts 
.mac program and are placed around the chip in the order 
specified in the program starting in the upper lefthand 
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Figure 2,1. MacPitts Chip Design Structure 



corner of the chip and continuing in a clockwise direction. 

The version of the MacPitts silicon compiler installed at the 
Naval Postgraduate School will not place pads on the left edge 
of the chip. A newer version of the compiler that is 
available commercially places pads on all four sides of the 
chip. All output pads are super buffered but the input data 
and clock pads are not. 

Along with the ground and power pads, the three-phase 
clock pads must also be defined in all MacPitts programs 
even though the clock may not be used in the circuit. The 
clock bus is always laid out on the chip. The MacPitts 
compiler uses a three-phase overlapping clock scheme where 
the clock period is divided into five segments as shown in 
Figure 2.2. This unusual clock scheme is used to drive the 
data storage registers and flags (see paragraphs C and E 
below) and according to [Ref. 4] allows a more compact 
layout of the registers and flags. 

A reset pad must also be defined if the "process" form 
is used in the .mac program even if the reset function is 
not used anywhere in the program. This is because the 
MacPitts compiler may use the reset signal in its internal 
algorithms when it generates the chip [Ref. 2]. If the 
"always" form is the only form used in the .mac program the 
reset pad is not required. 
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C. THE DATA-PATH 



The data-path is the unit where all word size operations 
are performed. These operations consist of arithmetic 
functions (addition, subtraction, incrementing, decrementing 
and equals), boolean functions (and, or, not, nand , nor, 
xor and equ) , data shifting operations, comparison tests and 
data storage and transfer using registers and ports [Refs. 2 
and 3]. The structure consisting of a one-bit slice of the 
above operations is referred to as an organelle and the LISP 
code used by the MacPitts compiler to generate each organelle 
can be found in the library and organelle sections of the 
MacPitts source code listing. 

The size of the data-path is determined by the number 
of bits in the data word (specified at the beginning of the 
.mac program) and the number of word size operations to be 
performed. The number of bits in the data word specifies 
the height of the data-path. The larger the data word the 
taller the data-path. The width is determined by the number 
of functions performed. When a specific function is to be 
performed in the data-path the organelle that performs that 
function is placed in the data-path. Replicas of that 
organelle (one for each bit of the data word) are stacked on 
top of each other. The organelle for the most significant 
bit of the data word is on the bottom of the stack and the 
organelle for the least significant bit is on the top. The 
ordering of the organelles in the data-path is determined 
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by the ordering of the word operations specified in the .mac 
program. The first word operation encountered by the 
compiler in the .mac program is the first organelle in the 
data-path and so on. The compiler takes into account the 
size requirements of each organelle to scale the amount of 
space between organelles to allow enough room for connection 
lines, control lines, power lines, and local interconnection 
buses. Power and ground buses are also sized based on 
organelle power requirements [Ref. 3]. 

The routing of data to and from the data-path is very 
inefficient and requires many data lines to be longer than 
necessary. As seen in Figure 2.1, the chip pads are placed 
only on the top, right and bottom sides of the chip. Data 
entering the chip on input ports and exiting the chip 
through output ports is routed from the left side of the 
data-path. This causes very long data lines. Data from the 
data-path to the Weinberger array is routed from the bottom 
side of the data-path to the top side of the array. 

All arithmetic and boolean function organelles are 
implemented using three basic gate structures. They are the 
NAND, NOR and inverter. Figure 2.3 shows an AND organelle 
that is made from a NAND gate and an inverter. In Figure 2.4 
an XOR function is implemented using NAND gates. An OR gate 
is implemented from a NOR gate and an inverter and the 
boolean EQU function is implemented using four NOR gates. 
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Figure 2.3. AND Organelle Structure 




Figure 2.4. Exclusive-OR Organelle Structure 
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The one-bit full adder circuit in Figure 2.5 shows how 
a complex function can be implemented by putting several 
different organelles together. In this case two XOR 
organelles and one NAND organelle are used. Figure 2.6 
shows how the one-bit adder is used by MacPitts to build a 
two-bit full adder circuit with carry in. 

Two different structures are used in the data-path to 
define and store organelle inputs and outputs. They are the 
internal port and the register, and both are the same size 
as the data word. Internal ports are used primarily to 
transfer the output of an organelle to another organelle or 
to the Weinberger array within the same clock cycle or state 
period [Ref. 2] . Registers are used to store word size data 
elements. A one-bit register organelle consists of a 
master-slave flip-flop, as shown in Figure 2.7, that is 
controlled by the MacPitts three-phase clock [Ref. 3]. This 
structure allows the output of the register to be valid 
during a clock cycle even though a new input value could be 
in the process of being clocked into the register. The 
enable line in Figure 2.7 is used to control which clock 
cycles the register samples the input line for data storage. 
A memory refresh cycle is performed if new data is not 
stored during a clock cycle. If data is to be stored in 
every clock cycle the enable line is connected to Vdd. 
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Figure 2.5. MaePitts Full Adder Design 
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Figure 2.6. 2-Bit Full Adder Circuit 
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Figure 2.7. 1-Bit Register Cell 
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D. THE SEQUENCER 

The sequencer is a mini data-path and is placed on the 
chip between the data-path and the flags block. If the .mac 
program contains a process whose value depends on the system 
state, a sequencer is placed on the chip to control the 
system state of the chip. The sequencer usually contains 
registers to store the current system state. Every clock 
cycle the current system state is transferred to the 
Weinberger array from the registers and then the new system 
state is transferred from the Weinberger array to the 
sequencer for storage. Additional details about the 
sequencer are given in [Refs. 3, 7 and 8]. 

E. THE FLAGS BLOCK 

Flags have a similar function to registers, but they 
store only one-bit of data from the Weinberger array. Flags 
also have a master-slave flip-flop structure but extra 
inverters are used in the flags block to drive the clock 
signals because there may be as many as twenty or thirty 
flags in the flags block (see Figure 2.8). The enable line 
of a flag performs the same function as the enable line of a 
register. Flags are placed side by side with the flags block 
increasing in width as more flags are needed. The rightmost 
structures in the flags block are the six inverters used to 
drive the three clock lines (see Figure 2.9). The leftmost 
flag in the flags block is the first flag encountered by the 
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Figure 2.8. 1-Bit Flag Cell 
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Figure 2.9. MacPitts Layout of a Four Cell Flags Block 









compiler in the "always" or "process" section of the .mac 
program. Each subsequent flag encountered by the compiler is 
placed on the right of the previous flag. Since the flags 
block cannot expand in the vertical direction there is 
wasted space on the chip above the flags block if the 
data-path or sequencer is taller than the flags block. Also, 
if the .mac program requires a large number of flags the 
width of the flags block may make the dimension of the chip 
exceed the MOSIS chip size constraints. 

F. THE WEINBERGER ARRAY 

The Weinberger array, or control unit, or a MacPitts 
designed chip is the unit where all chip control signals are 
generated and bit size boolean functions are performed. All 
inputs and outputs to the array are routed to the top of the 
array. Input and output signal lines are routed around the 
left side of the array and then to the top. 

The data lines connecting the Weinberger array to the 
data-path, sequencer, and flags block are called the "river". 
The algorithm that routes the "river" does not allow the 
data lines to cross each other so the lef t-to-right ordering 
of the functions performed in the array is determined by the 
left-to-right ordering of the data transferred from the 
data-path, sequencer and flags block to the array. Array 
functions that use data from the data-path are placed in the 
left section of the array, array functions that use data from 
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the sequencer are placed in the center section of the array 
and array functions that use data from the flags block are 
placed in the right section of the array. Since no data 
lines in the "river" can cross each other data that is 
transferred between the data-path, sequencer or flags block 
must pass through the array even though no function is 
performed on the data in the array. 

The Weinberger array consists of a regular structure of 
NOR gates having arbitrary numbers of inputs. The pull-up 
transistors of the NOR gates are connected to Vdd at the 
bottom of the array and run vertically the full height of 
the array. Vertical ground wires run parallel to the pull-up 
transistor lines from the ground bus at the top of the 
array. Inputs to the NOR gates run horizontally through the 
array and form pull-down transistors when connected to ground 
and the NOR gate output line. The NOR gate output lines also 
run horizontally through the array and may be used as input 
lines to other NOR gates or routed to a flag or signal output 
pad. As more NOR gates are added to the Weinberger array or 
more inputs or outputs are added to each gate the array 
increases in width. The height of the array is determined 
by the number of horizontal interconnections between the NOR 
gates [Ref . 7 ] . 

Eight different boolean functions are implemented in the 
Weinberger array, all with NOR gates: NOR, AND, NAND , OR, 

EQU , XOR, parity and NOT [Ref. 2]. Figure 2.10 shows how an 
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Figure 2.10. NOR Gate Implementation of the XOR Function 
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XOR function is implemented using NOR gates. The stick 
diagram of Figure 2.11 shows the Weinberger array implemen- 
tation of the XOR function from Figure 2.10 and Figure 2.12 
shows an actual Weinberger array layout of this function. 

The PLA and Weinberger array structures are very 
similar but there are several important differences. First, 
the PLA has only two levels of logic, the AND and the OR 
planes. The Weinberger array can have an arbitrary NOR gate 
depth. Although a PLA can implement the same functions 
performed in the Weinberger array the MacPitts designers 
found that when a boolean function was normalized in the 
sum-of-products form the Weinberger array's NOR gate depth 
allowed a much more compact structure than the PLA ' s 
[Ref. 3]. Another difference is that the complement of each 
input signal does not have to be available at the input of 
the Weinberger array as a PLA requires. The complements of 
array inputs are generated in the array if they are required. 

It has been found that the generation of the Weinberger 
array usually takes from 90% - 95% of the computer's 
compilation time in generating a MacPitts design. When an 
8-bit 5-stage pipeline adder was designed using the 
MacPitts compiler 162 CPU minutes (about eight hours on a 
lightly loaded computer system) were required to complete this 
design. Most of this time was required to lay out the 228 
vertical control columns (the number of array inputs and 
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Figure 2.11. Stick Diagram of the Weinberger Array 
Implemontat ion of an XOR Function 
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igure 2.12. Weinberger Array Layout of an XOR Functi 





outputs plus the number of nor gates in the array) and the 
81 horizontal control tracks (the number of nor gate inputs 
and outputs in the array). When a 16-bit 5-stage pipeline 
adder design was attempted which contained 435 control 
columns and 157 control tracks the design process was killed 
after 4800 CPU minutes (four days) were spent designing the 
Weinberger array. When the size of the Weinberger array of a 
4-bit 5-stage pipeline adder (126 columsn and 43 tracks) is 
compared with the size of the 8-bit and 16-bit adders it 
can be seen that the Weinberger array becomes nearly four 
times larger and more complex in this 5-stage pipeline design 
when the size of the data word is doubled. 
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III. 



THE DESIGN OF ADDER CIRCUITS 



A. COMBINATIONAL ADDERS 

The design of combinational adder circuits with the 
MacPitts compiler is more straightforward than the design of 
pipeline adder circuits. The output sum of a combinational 
adder depends only on the present inputs to the circuit. 
Unfortunately, several compiler design constraints cause the 
combinational adder design to be more complicated than 
necessary . 

The compiler adds two input vectors (ain and bin) in 
the data-path using the ripple carry full adder circuit 
shown in Figure 2.5. The first problem occurs when trying to 
add the input carry (cin) to the first bit of ain and bin. 
Since cin is a one-bit sized data element and the data-path 
can only manipulate word size data elements cin must be 
converted to a word sized data element. This requires 
additional circuitry in the data-path and the Weinberger 
array and also additional statements in the MacPitts .mac 
program. 

A second problem occurs because the MacPitts language 
in which the .mac program is written allows only two 
variables in the addition function [Ref. 2]. All MacPitts 
functions are limited to one or two variables. It is assumed 
that the number of variables in a MacPitts function was 
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limited by the compiler designers to simplify the design of 
the compiler. The simple LISP addition function of 

(+ ain bin cin) 

is accomplished in MacPitts with the more complicated 
function 

(+ ain (+ bin cin)). 

This embedded addition causes two full adder circuits to be 
connected in cascade. In the first full adder bin is added 
to cin and this sum is added to ain in the second full adder. 

A third problem is that the carry in and carry out 
lines of the full adder cannot be addressed by the 
programmer. They are only used to ripple the carry bits 
between full adder stages. The carry in of the bit 0 full 
adder is connected to ground; the carry out of the last 
full adder stage is not connected to anything. If a chip 
carry out is desired it must be generated by additional 
circuitry in the Weinberger array. 

Figure 3.1 shows a block diagram of the data-path for a 
two bit combinational adder circuit with carry in. In 
Figure 3.2 the .mac program for a 4-bit combinational adder 
is shown. Lines 14 and 15 convert the carry in signal to a 
word size data element. The least significant bit of the 
carry in word is set to 1 or 0 depending on the value of the 
carry in signal. All other bits of the carry in word are 
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1 ;adder 4-bit combinational 

2 (program add 4 

3 (def 1 ground) 

4 (def ain port input (2 3 4 5)) ;input vector 

5 (def bin p>orl input (6 7 8 9)) ;input vector 

6 (def res porl output (10 11 12 13)) ;output vector 

7 (def cin signal input 14) ;carry in 

8 (def carry port internal) 

9 (def 15 phia) 

10 (def 16 phib) 

1 1 (def 1 7 phic) 

1 2 (def 18 power) 

13 (always 

14 (cond (cin (setq carry 1)) 

15 (t (setq carry 0))) 

16 (setq res (-1 ain (-r bin carry))))) 



Figure 3.2. 4-Bit Combinational Adder .mac Program 
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set to 0 regardless of the value of the carry in signal. 

This can be seen on the left side of Figure 3.1. Figure 3.3 
and 3.4 show the 4 micron MacPitts designs for a 4-bit and an 
8-bit combinational adder. The size of the 4-bit adder is 
2.292mm x 2.398mm and the size of the 8-bit adder is 3.508mm 
X 3.614mm. As shown, the size of the chip frame is larger 
than required by the circuitry inside the chip. A larger 
frame is needed because pads can be placed only on three 
sides of the frame. The frame could be smaller and the chip 
area could be more effectively used if pads were placed on 
all four sides of the frame. Both of these MacPitts designs 
produced correct simulations when simulated by the event 
driven switch level simulator, esim, using the procedure 
outlined in [Ref. 4]. 

B. PIPELINE ADDERS 

The purpose of pipelining a circuit is to increase the 
throughput of the circuit. The combinational logic of a 
circuit is partitioned into several smaller functional units 
or stages and storage registers are placed between each 
stage. During each clock period data is clocked from the 
input storage register of each stage through the 
combinational logic of the stage and into the output storage 
register of the stage. Also, during each clock period a 
result exits the pipeline. Since the combinational logic in 
each stage of the pipeline circuit is less than the total 
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combinational logic of the combinational circuit the 
pipeline circuit has a shorter logic propagation delay 
during each clock period. This allows the pipeline circuit 
to operate at a faster clock speed and higher data output 
rate (throughput) than the combinational circuit. A 
disadvantage of a pipeline circuit is the latency caused by 
the time that is required to fill and empty the pipeline. 
Reference 9 should be consulted for more information on 
pipelining. 

There are many different algorithms that can be used to 
design a pipeline adder circuit. The block carry-look-ahead 
(BLCA) addition algorithm [Ref. 10] was used so that a 
comparison could be made between a MacPitts designed 
pipeline adder circuit and the hand-crafted pipeline adder 
circuit designed by Conradi and Hauenstein [Ref. 5]. 
Equations 6.1 thru 6.12 of [Ref. 5] are used to implement 
the BCLA addition algorithm. As described in [Ref. 5], the 
BLCA pipeline adder can be conveniently divided into the 
following five stages: 

1. Calculate the carry generate (G^) and the carry 
propagate (Pj^) from the input addition operands. 

G. = A.B. 

1 11 

P. = (A. )XOR(B. ) 

1 1 1 
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2. Calculate one block generate (BG.) and block propagate 

1 

(BP ) for every four bits of the addition operands 

<J 

from the G.'s and P.'s. 

1 1 

BPj = Pi+3^i+2''’i+l^i ’ j=0,l,2 ,3, . . . ; i=4j 



BG = G +G +P +G P P +G P P P 

^i+3 ^i+2 ^i+3 ^i+1 i+2^i+3 ^1 i+l^i+2 i+3 ’ 



j=0,l,2 ,3, . . . ; i=4j 



3. Calculate the block carry (BC^) for each carry block. 






BC^ = BG^+BGqBP^+C_^BPqBP^ 



BC^l = BG2+BG^BP2tBGoBP^BP2+C_^BPQBP^BP2 






4. Calculate the look-ahead-carry (C^) for each bit of the 
operands . 



^0,4,8,12 ^0,4,8, 12’^^‘"-1 ,3,7 ,11^0 ,4,8,12 



^1,5,9,13 ‘^1,5,9,13'^‘^0,4,8,12^1,5,9,13 

+RG P P 

-1,3,7,11 0,4,8,12 1,5,9,13 
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^2,6,10,14 ^2,6,10,14'^*^1,5,9,13^2,6,10,14 

■^^0,4,8,12^1,5,9,13^2,6,10,14 

+BC P P P 

-1,3,7,11 0,4,8,12^1,5,9,13^2,6,10,14 

(Note that 4 8 12 1=0,4,8,12.) 

5. Calculate the sum bits (S^). 

S. = (A. )X0R(B. )X0R(C. 

1 1 1 1-1 

The Conradi and Hauenstein [Ref. 5] pipeline adder had 

only four stages. Stages 1 and 2 were combined by writing 

the equations describing the BG . ' s and BP . ' s in terms of the 

3 J 

input operands instead of in terms of the G^'s and P^'s. The 
MacPitts pipeline adders contain five stages because the 
increased stage propagation delay caused by combining stages 
1 and 2 could slow the clock speed of the circuit and the 
fastest possible clock speed is desired. 

Figure 3.5 shows the .mac program for a 4-bit 5-stage 
pipeline adder circuit. The carry in of the chip is used in 
all stages of the pipeline so a separate storage location is 
required for each stage as shown in lines 26 thru 29. The 
carry propagate and carry generate calculated in stage 1 are 
used in stages 4 and 5, respectively, so multiple storage 
locations are also used for these quantities. The 
calculations of the carry generate and carry propagate. 
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1 (program addp 4 

2 jThi? adder uses block carr> luokaliead (BCLA) addition 

3 (def 1 ground) 

4 (def ain port input (2 3 4 5)) ;input vector 

5 (def bin port input (6 7 8 9)) ;input vector 

6 (def cin signal input 10) ;carry into chip 

7 (def surn3 signal output 11) ;bit 3 sum 

8 (def sum2 signal output 12) ;bit 2 sum 

9 (def suml signal output 13) ;bit 1 sum 

10 (def sumO signal output 14) ;bit 0 sum 

11 (def cout signal output 15) ;carry out of chip 



12 (def 16 phia) 

13 (def 17 phib) 

14 (def 18 phic) 

15 (def 19 power) 

16 (def pi register) 

17 (def p2 register) 

18 (def p3 register) 

19 (def p4 register) 

20 (def gl register) 

21 (def g2 register) 

22 (def g3 register) 

23 (def bpO flag) 

24 (def bgO flag) 

25 (def bc3 flag) 

26 (def carryl flag) 

27 (def carry2 flag) 

28 (def carry3 flag) 

29 (def carry4 flag) 



30 


(def cO 


flag) 


31 


(def cl 


flag) 


32 


(def c2 


flag) 


33 


(def c3 


flag) 


34 


(def carryout flag) 



35 (def addO flag) 

36 (def addl flag) 

37 (def add2 Hag) 

38 (def add3 flag) 

39 (always 

40 ; 

4 1 , St age One 
42 ; 



;clock phases 



carry propagate-stage one 
-stage two 
-stage three 
-stage four 

carry generate-stage one 
-stage two 
-stage three 
block carry propagate 
block carry generate 
block carry 
cin-stage one 
-stage two 
-stage three 
-stage four 
)it 0 carry 
>it 1 carry 
>it 2 carry 
>it 3 carry 
cout flag 
bit sum flags 



Figure 3.5. 



MacPitts .mac Program for a 4-Bit 5-Stage 
Pipeline Adder Circuit 
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43 (par (setq pi (word-xor aln bin)) 

44 (selq gl (word-and ain bin)) 

45 (cond (cin (setq rarr> 1 t)) 

40 (t (setq carryl f)))) 

47 ; 

48 ;Stage Two 
49; 

50 (par (setq bpO (and (bit 3 pi) (bit 2 pi) (bit 1 pi) (bit 0 pi))) 

51 (setq bgO (or (bit 3 gl) (and (bit 2 gl) (bit 3 pi)) 

52 (and (bit 1 gl) (bit 2 pi) (bit 3 pi)) 

53 (and (bit 0 gl) (bit 1 pi) (bit 2 pi) (bit 3 pi)))) 

54 (selq p2 pi) 

55 (setq g2 gl) 

56 (setq carry2 carryl)) 

57 ; 

58 ;Stage Three 
59; 

60 (par (setq bc3 (or bgO (and carry2 bpO))) 

61 (setq p3 p2) 

62 (setq g3 g2) 

63 (setq carry3 carry2)) 

64 ; 

65 ;Stage Four 

66 ; 

67 (pfi^ (setq cO (or (bit 0 g3) (and carry3 (bit 0 p3)))) 

68 (setq cl (or (bit 1 g3) (and (bit 0 g3) (bit 1 p3)) 

69 (and rarrv3 (bit 0 p3) (bit 1 p3)))) 

70 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 pS)) 

71 (and carry3 (bit 0 p3) (bit 1 p3) (bit 2 p3)))) 

72 (setq p4 p3) 

73 (setq c3 bc3) 

74 (•*'^^9 carrv4 carr\3)) 

75 ; 

76 ;Stage Five 

77 ; 

78 (par (setq addO (xor (bit 0 p4) carry4)) 

79 (setq addl (xor (bit 1 p4) cO)) 

80 (setq add2 (xor (bit 2 p4) cl)) 

81 (setq add3 (xor (bit 3 p4) c2)) 

82 (setq carryout c3) 

83 (setq sumO addO) 

84 suml addl) 

85 (.sel(j sum2 add2) 

86 (setq sum3 add3) 

87 (setq cout carryout)))) 

Figure 3.5, MacPitts .mac Program for a 4-Bit 5-Stage 
Pipeline Adder Circuit (cont.) 
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shown in lines 43 and 44, are performed with the word-xor 
and the word-and functions in the data-path. All functions 
in stages 2 thru 5 require the manipulation of bit size data 
elements. These functions are performed in a large 
Weinberger array. Registers and flags are used to store the 
input and output data of each stage. It takes less circuitry 
(and fewer statements in the .mac program) to manipulate 
word size data elements and store them in registers than to 
manipulate bit size data elements and store them in flags. 
Since there is no MacPitts function to set the bits of a 
word to a particular value the bit sized output data 
elements of stages 2 through 4 cannot be combined into words 
and stored in registers. The output data of stages 2 through 
4 must be stored in flags and this requires a very large 
flags block. 

A pipeline circuit designed by the MacPitts compiler 
does not perform like a standard pipeline circuit as 
described in [Ref. 9] because the input data of each stage 
is valid before the start of the clock period. When data is 
stored in a MacPitts register or flag the data is valid on 
the register or flag output line before the end of the clock 
period (see Figures 2.2, 2.7, and 2.8). The data then starts 
to propagate through the combinational logic of the next 
stage before the start of the next clock period. During the 
next clock period the data will continue to propagate 
through the stage combinational logic during the first two 
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clock period segments and will then be stored in registers 
or flags during the third clock period segment. 

Since the output data of stage four of the pipeline 
circuit starts propagating through the logic of stage five 
before the start of the next clock period it is possible 
that the data could propagate all the way through the logic 
of stage 5 and reach the output pads of the chip at the 
beginning or before the start of the next clock period. This 
causes the fifth stage of the pipeline to be essentially 
part of the fourth stage. For comparison purposes with the 
Conradi and Hauenstein adder [Ref. 5] it is desired that the 
fifth stage of the circuit be separate from the fourth. To 
insure that there are five stages in the pipeline instead of 
four the sum bits and carry out bit of the adder operands 
calculated in stage 5 (lines 78 thru 82) are stored in flags 
and then the outputs of the flags propagate to the output 
pins (lines 83 thru 87) during the end of the clock period. 

The layout of a 4 micron 4-bit 5-stage pipeline adder, 
which uses only one carry-look-ahead block, is shown in 
Figure 3.6. Figure 3.7 shows a block diagram of this 
circuit. The size of the circuit is 4.828mm x 2.918mm. As 
shown, the circuit has a large Weinberger array and flags 
block. The long flags block takes up almost half the width 
of the chip and causes a lot of wasted area on the right of 
the Weinberger array. An 8-bit 5-stage pipeline adder will 
contain two carry-look-ahead blocks. The .mac program of the 
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Figure 3.6 Layout of a 4-Bit 5-Stage Pipeline Adder Circuit 
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Figure 3.7. Block Diagram of a 4-Bit 5-Stago Pipeline Addei Ciicuit 



8-bit adder is shown in Figure 3.8 and the circuit layout is 
shown in Figure 3.9. The block diagram of the 8-bit adder 
would be the same as the block diagram of the 4-bit adder 
shown in Figure 3.7. The size of the 8-bit adder circuit is 
6.650mm x 4.358mm. The data-path is twice as tall, the flags 
block is almost twice as long and the area of the Weinberger 
array is four times larger in the 8-bit adder than in the 
4-bit adder. 

An attempt was made to design a 16-bit 5-stage pipeline 
adder with the MacPitts compiler. The compiler was able to 
design all but the large Weinberger array which is four 
times larger than the 8-bit adder array. The program that 
designs the Weinberger array uses a recursive algorithm and 
the depth of recursion is limited by the amount of memory 
available to the LISP compiler. Since the array of the 
16-bit adder circuit is so large the limit of the depth of 
recursion was reached. 

The 16-bit pipeline adder contains four carry-look-ahead 
blocks. When the .mac program of the 16-bit adder (Figure 
3.10) is compared to the .mac programs of the 8-bit and 
4-bit adders (Figures 3.6 and 3.9) the programs are 
essentially the same except for additional statements in 
stages 2 through 5 due to the larger 16-bit data word and 
due to the additional carry-look-ahead blocks. 
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Figure 



1 (program addp 8 

2 ;This adder uses block carr> IcxAahead (HCLA) addition 

3 (def 1 ground) 

4 (def ain port inpul (2 3 4 5 G 7 8 9)) ;input vector 

5 (def bin port inpul (10 11 12 13 14 15 16 17)) ;input vector 



6 


(def cin signal input 18) 






arry into chip 


7 


(def sum7 signal output 


19) 




bit 7 sum 


8 


(def suin6 signal output 


20) 


> 


bit 6 sum 


9 


(def sum5 signal output 


21) 


1 


bit 5 sum 


10 


(def sum4 signal output 


22) 


bit 4 sum 


11 


(def surn3 signal output 


23) 


1 


bit 3 sum 


12 


(def surn2 signal output 


21) 


1 


bit 2 sum 


13 


(def surnl signal outf>ut 


25) 


1 


bit 1 sum 


14 


(def surnO signal output 


26) 


bit 0 sum 


15 


(def cout signal output 


27) 




carry out of chip 


16 


(def 28 phia) 


; 


cloc 


k phases 


17 


(def 29 phib) 








18 


(def 30 phic) 








19 


(def 31 power) 








20 


(def pi register) 




carry propagat e-stage one 


21 


(def p2 register) 






-stage two 


22 


(def p3 register) 






-stage three 


23 


(def p4 register) 






-stage four 


24 


(def gl register) 




carry generate-stage one 


25 


(def g2 register) 






-stage two 


26 


(def g3 register) 






-stage three 


27 


(def bpO flag) 




block carry propagate 


28 


(def bpl flag) 








29 


(def bgO flag) 


1 


block carry generate 


30 


(def bg 1 flag) 








31 


(def bc3 flag) 


; 


block carry 


32 


(def bc7 flag) 








33 


(def carry 1 flag) 




;cin 


•stage one 


34 


(def carry2 flag) 






stage two 


35 


(def carry3 flag) 




- 


stage three 


36 


(def carry4 flag) 




- 


stage four 


37 


(def cO flag) 




bit 


0 carry 


38 


(def cl flag) 




bit 


1 carry 


39 


(def c2 flag) 




bit 


2 carry 


40 


(def c3 flag) 




bit 


3 carry 


41 


(def c4 flag) 




bit 


4 carry 


42 


(def c5 flag) 




bit 


5 carry 


43 


(def c6 flag) 




bit 


6 carry 


44 


(def c7 flag) 




l)it 


7 carry 


45 


(def carryout flag) 




cout flag 


46 


(def addO flag) 




;bil 


sum flags 


47 


(def addl flag) 
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MacPitts .mac Program for a »-bit 




Pipeline Adfl 
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48 (def add2 flag) 

49 (def adds flag) 
f)0 (def add4 flag) 

51 (def adds flag) 

52 (def addC flag) 

53 (def add? flag) 

54 (always 

55 ; 

56 ;Stage One 

57 , 

58 (par (setq pi (word-^or aiu bin)) 

59 (seiq gl (word-and ain bin)) 

60 (cond (cin (setq carr> 1 l)) 

61 (t (setq carryl f)))) 

62 ; 

63 , Stage Tw'o 

64 ; 

65 (par (setq bpO (and (bit 3 pi) (bit 2 pi) (bit 1 pi) (bit 0 pi))) 

66 (setq bpl (and (bit 7 pi) (bit 6 pi) (bit 5 pi) (bit 4 pi))) 

67 (setq bgO (or (bit 3 gl) (and (bit 2 gl) (bit 3 pi)) 

68 (and (bit 1 gl) (bit 2 pi) (bit 3 plj) 

69 (and (bit 0 gl) (bit 1 pi) (bit 2 pi) (bit 3 pi)))) 

70 (setq bgl (or (bit 7 gl) (and (bit 6 gl) (bit 7 pi)) 

71 (and (bit 5 gl) (bit 6 pi) (bit 7 pi)) 

72 (and (bit 4 gl) (bit 5 pi) (bit 6 pi) (bit 7 pi)))) 

73 (setq p2 pi) 

74 (setq g2 gl) 

75 (setq carry2 carryl)) 

76 ; 

77 , Stage Three 

78 ; 

79 (par (setq bc3 (or bgO (and carry2 bpO))) 

80 (setq bc7 (or bgl (and bgO bpl) (and carry2 bpO bpl))) 

81 (setq p3 p2) 

82 (setq g3 g2) 

83 (setq carry3 carry2)) 

84 ; 

85 , Stage Four 

86 ; 

87 (par (setq cO (or (bit 0 g3) (and carry3 (bit 0 p3)))) 

88 (setq cl (or (bit 1 g3) (and (bit 0 g3) (bit 1 p>3)) 

89 (and carryS (bit 0 p3) (bit 1 p3)))) 

90 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 p3)) 

91 (and carryS (bit 0 p3) (bit 1 p3) (bit 2 p3)))) 

92 (setq c3 bc3) 

93 (setq c4 (or (bit 4 g3) (and bc3 (bit 4 g3)))) 

Figure 3.8. MacPitts .mac Program for a 8-Bit 5-Stage 
Pipeline Adder Circuit (cont.) 
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94 (selq c5 (or (bit 5 g3) (and (bit 4 g3) (bit 5 p3)) 

95 (and bc3 (bit 4 p3) (bit 5 p3)))) 

9() (^^^9 ^'3) (hnd (bii f) go) (bii 6 po)) 

97 (and (bii 4 g3) (bit 5 p3) (bit 6 p3)) 

98 (and bc3 (bit 4 p3) (bit 5 p3) (bit 6 p3)))) 

99 (setq c7 bc7) 

100 (setq p4 p3) 

101 (setq carry4 carry3)) 

102 ; 

103 ;Stage Five 

104 ; 

105 (par (setq addO (xor (bit 0 p4) carr)4)) 

106 (selq addl (xor (bit 1 p4) cO)) 

107 add2 (xor (bit 2 p4) cl)) 

108 (setq add3 (xor (bit 3 p4) c2)) 

109 (setq add4 (xor (bit 4 p4) c3)) 

110 (setq adds (xor (bit 5 p4) c4)) 

111 addC (xor (bit 6 p4) c5)) 

112 (setq add7 (xor (bit 7 p4) c6)) 

113 (setq carryout c7) 

114 (setq sumO addO) 

1 1 5 (selq sum 1 add 1 ) 

1 16 sum2 add2) 

1 1 7 (setq sum3 add3) 

118 (setq surn4 add4) 

1 19 (setq sum5 addS) 

120 (setq sum6 add6) 

121 (setq sum? add?) 

122 (setq cout carryout)))) 



Figure 3.8. MacPitts .mac Program for a 8-Bit 5-Stage 
Pipeline Adder Circuit (cont.) 
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1 (program addf) 16 

2 ;This add<*r block carr> b*okahead (HCLA) addition 

3 (def 1 ground) 

4 (def aln port input (2 3 4 6 7 8 9 1 0 1 1 12 13 14 lf> 16 17)) 

5 (def bill pc»n input (18 19 20 21 22 23 21 2o 26 27 28 29 30 31 32 33)) 

6 (def cln signal input 34) 

7 (def sumlo ^ignal output 35) 

8 (def sum 14 signal output 36) 

9 (def sum 13 signal output 37) 

10 (def surnl2 signal output 38) 

11 (def sum 11 signal output 39) 

12 (def surnlO signal output 40) 

13 (def surn9 signal output 41) 

14 (def sum8 signal output 42) 

15 (def suin7 signal output 43) 

16 (def sum6 signal output 44) 

17 (def sum5 signal output 45) 

18 (def suni4 signal output 46) 

19 (def sum3 signal output 47) 

20 (def sum2 signal output 48) 

21 (def surnl signal output 49) 

22 (def sumO signal output 50) 

23 (def cout signal output 51) 

24 (def 52 phia) 

25 (def 53 phib) 

26 (def 54 phic) 

27 (def 55 power) 

28 (def pi register) 

29 (def p2 register) 

30 (def p3 register) 

31 (def p4 register) 

32 (def gl register) 

33 (def g2 register) 

34 (def g3 register) 

35 (def bpO flag) 

36 (def bpl flag) 

37 (def bp2 flag) 

38 (def bp3 flag) 

39 (def bgO flag) 

40 (def bgl flag) 

Figure 3.10. MacPitts .mac Program for a 16-Bit 
5-Stage Pipeline Adder Circuit 
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4\ 


(def bg2 flag) 


4 2 


(def bg3 flag) 


4r> 


(def bc3 flag) 


44 


(def bc7 flag) 


4 5 


(def be 1 1 flag) 


4 0 


(def be 1 5 flag) 


47 


(def earr> 1 flag) 


48 


(def carry2 flag) 


49 


(def carr> 3 flag) 


50 


(def earry4 flag) 


51 


(def eO flag) 


52 


(def el flag) 


bo 


(def e2 flag) 


54 


(def c3 flag) 


55 


(def e4 flag) 


56 


(def c5 flag) 


57 


(def c6 flag) 


58 


(def c7 flag) 


59 


(def e8 flag) 


60 


(def c9 flag) 


61 


(def elO flag) 


62 


(def cl 1 flag) 


63 


(def cl2 flag) 


64 


(def c 13 flag) 


65 


(def c 1 4 flag) 


66 


(def cl 5 flag) 


67 


(def carryout flag) 


68 


(def addO flag) 


69 


(def addl flag) 


70 


(def add2 flag) 


71 


(def add3 flag) 


72 


(def add4 flag) 


73 


(def adds flag) 


74 


(def add6 flag) 


75 


(def add7 flag) 


76 


(def adds flag) 


77 


(def add9 flag) 


78 


(def addlO flag) 


79 


(def addll flag) 


80 


(def addl2 flag) 



Figure 3.10. MacPitts .mac Program for a 16-Bit 

5-Stage Pipeline Adder Circuit (cont.) 
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81 


(def add 1 3 flag) 




82 


(def add 1 4 flag) 




83 


(def add 15 flag) 




81 


(alv\ ays 




85 
80 
k 7 


St age One 




c t 
88 


(par (setq pi (uord- 


xi>r ain bin)) 


89 


(setq gl (uord-and ain bin)) 


90 


(cond (cm (setq carr) 1 t)) 


91 

92 

93 


(t (setq carry 1 f)) )) 


St age Two 




91 

95 


(par (setq bpO (and (bit 3 pi) (bit 2 pi) (bit 1 pi) (bit 0 pi))) 


96 


(setq bpl (and (bit 7 pi) (bit 6 pi) (bit 5 pi) (bit 4 pi))) 


97 


(setq bp2 (and (bit 11 pi) (bit 10 pi) (bit 9 pi) (bit 8 pi))) 


98 


(setq bp3 (and (bit 15 pi) (bit 14 pi) (bit 13 pi) 


99 


(bit. 12 pi))) 


100 


(setq bgO (or (bit 3 gl) (and (bit 2 gl) (bit 3 pi)) 


101 


(and (b 


t 1 g 1 ) (bit 2 pi ) (bit 3 pi )) 


102 


(and (b 


t 0 gl) (bit 1 pi) (bit 2 pi) (bit 3 pi)))) 


103 


(setq bg 1 (or ( 1 


it 7 gl) (and (bit 6 gl) (bit 7 pi)) 


104 


(and (b 


t 5 gl) (bit 6 j»l) (bit 7 pi)) 


105 


(and (b 


t 4 gl) (bit 5 pi) (bit 6 pi) (bit 7 pi)))) 


106 


(setq bg2 (or (\ 


it 11 gl) (and (bit 10 gl) (bit 11 pi)) 


107 


(and (b 


1 9 gl) (bit 10 pi) (bit 11 pi)) 


108 


(and (b 


l 8 gl) (bit 9 pi) (bit 10 pi) (bit 11 pi)))) 


109 


(setq bg3 (or (1 


it 15 gl) (and (bit 14 gl) (bit 15 pi)) 


1 10 


(and (b 


t 13 gl) (bit 14 pi) (bit 15 pi)) 


111 


(and (b 


t 12 gl) (bit 13 pi) (bit 14 pi) (bit 15 pi)))) 


112 


(setq p2 pi) 




113 


(setq g2 gl) 




1 14 


(setq carr\ 2 carry 1 )) 


1 15 


* 




1 16 ;Si age Three 




117 


1 




118 


(par (setq bc3 (or bgO (and carry2 bpO))) 


1 19 


(setq bc7 (or bg 1 (and bgO bpl) (and carr\ 2 bpO bpl))) 


120 


(setcj bell (or bg2 (and l)gl bp2) (and bgO bpl bp2) 



Figure 3.10. MacPitts .mac Program for a 16-Bit 

5-Stage Pipeline Adder Circuit (cont.) 
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121 (and carry2 bpO bpl bp2))) 

122 (seiq be 1 5 (or bg3 (and bg2 bp3) (and bgl bp2 bp3) 

123 (and bgO bpl bp2 bp3) (and carr\2 bpO bpl bp2 bp3))) 

121 p‘-^l 

12') (=»etq g3 g2) 

120 (^^1^1 carr>3 carry 2)) 

127 ; 

128 ,S»age Four 

129 ; 

130 (par (setq cO (or (bit 0 g3) (and carry3 (bit 0 p3)))) 

131 (s^^tq cl (or (bit 1 g3) (and (bit 0 g3) (bit 1 p3)) 

132 (and carrv3 (bit 0 p3) (bit 1 p3)))) 

133 (setq c2 (or (bit 2 g3) (and (bit 1 g3) (bit 2 p3)) 

134 (and carry3 (bit 0 p3) (bit 1 p3) (bit 2 p3)))) 

135 (setq c3 bc3) 

136 (setq c4 (or (bit 4 g3) (and bc3 (bit 4 g3)))) 

137 (bit 5 g3) (and (bit 4 g3) (bit 5 p3)) 

138 (and bc3 (bit 4 p3) (bit 5 p3)))) 

139 (^>^tq c6 (or (bit 6 g3) (and (bit 5 g3) (bit 6 p3)) 

140 (and (bit 4 g3) (bit 5 p3) (bit 6 p3)) 

141 (and bc3 (bit 4 p3) (bit 5 p3) (bit 6 p3)))) 

142 (setq cl bc7) 

143 (s^^^ (^>r (bit 8 g3) (and bc7 (bit 8 p3)))) 

144 (setq c9 (or (bit 9 g3) (and (bit 8 g3) (bit 9 p3)) 

145 (and bc7 (bit 8 p3) (bit 9 p3)))) 

146 (setq clO (or (bit 10 g3) (and (bit 9 g3) (bit 10 p3)) 

147 (and (bit 8 g3) (bit 9 p3) (bit 10 p3)) 

148 (and bc7 (bit 8 p3) (bit 9 p3) (bit 10 p3)))) 

149 (setq cl 1 bell) 

150 (setq cl2 (or (bit 12 g3) (and bell (bit 12 p3)))) 

151 (setq cl3 (or (bit 13 g3) (and (bit 12 g3) (bit 13 p3)) 

152 (and bell (bit 12 p3) (bit 13 p3)))) 

153 (setq cl4 (or (bit 14 g3) (and (bit 13 g3) (bit 14 p3)) 

154 (and (bit 12 g3) (bit 13 p3) (bit 14 p3)) 

155 (and bell (bit 12 p3) (bit 13 p3) (bit 14 p3)))) 

156 (setq el5 belS) 

157 (i>^tq p4 p3) 

158 (setq carry4 carry3)) 



Figure 3.10. MaePitts .mac Program for a 16-Bit 

5-Stage Pipeline Adder Circuit (cont.) 
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In9 : 



1 60 .St age Five 

161 ; 



16‘2 

lO.'i 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 
1 76 

177 

178 

179 

180 
181 
182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 



(par (setq ad<lO (xor (bit 0 p4) carry4)) 



(sei<^ add I 
(setq add 2 
(^etq a (id 3 
(setq addl 



xor (b 
xor (b 
'xor (b 
;xor (1) 



I 1 pi) cO)) 
i 2 pl| cl)) 
I 3 i>l) c2)) 
l 4 



(setq add 5 (xor (b 
(setq addC (xor (b 
(setq add7 (xor (b 
(setq add 8 (xor (b 
(setq add9 (xor (b 
(setq add 10 (xor ( bit 
(setq add 1 1 (xor (bit 
(seiq add 12 (xor (bit 
(setq add 1 3 (xor ( bit 
(setq add 14 (xor (bit 
(setq add 1 5 (xor (bit 
(setq carryout c 1 5) 
(setq sumO addO) 

(setq suml addl) 

(setq sum2 add2) 

(setq sum3 add3) 

(setq sum4 add4) 

(setq sum5 add5) 

(setq surnC add6) 

(setq sum7 add7) 

(setq surn8 addS) 

(setq sum9 addO) 

(setq sum 10 addlO) 
(setq sum 1 1 add 1 1 ) 
(setq sum 1 2 add 1 2) 
(setq sum 1 3 add 1 3) 
(setq suml4 addl4) 
(setq sum 1 5 add 1 5) 
(setq coul carryout)))) 



p4) c3)) 

I 5 p4) c4)) 
l C p4) c5)) 
i 7 p4) c6)) 
l 8 pi) c7)) 
t 9p4)c8)) 

10 p4) c9)) 

1 1 p4) clO)) 

12 p4) cll)) 

13 p4) cl2)) 

14 p4) cl 3)) 

15 p4) cl4)j 



Figure 3.10. MacPitts .mac Program for a 18-Bit 

5-Stage Pipeline Adder Circuit (cont.) 
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Initially, simulations on the 4-bit and 8-bit pipeline 
adder circuits could not be performed due to numerous wiring 
and alignment errors in the MacPitts designs. These errors 
are discussed in Chapter V of this thesis. After all of the 
wiring and alignment errors were corrected the two adders 
produced correct simulations using esim. 
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IV. DESIGN PERFORMANCE COMPARISONS 



A. TIMING ANALYSIS USING CRYSTAL 
1 . Introduction 

Crystal is a VLSI circuit delay analysis program 
developed at the University of California at Berkeley. The 
slowest paths in the circuit are determined by Crystal and 
this information can be used to calculate the maximum clock 
speed of the circuit. Version 2 of Crystal found in the 
berkSS VLSI design tools available on the UNIX VAX computer 
system was used for all timing and delay analysis. 

Crystal reads circuit description information from a 
.sim file created by the circuit extractor program Mextra 
and then accepts commands from the programmer from the 
terminal keyboard. There are seven categories of Crystal 
commands and they must appear in the following order when a 
timing analysis is performed: model commands, circuit 

commands, dynamic node commands, check commands, setup commands, 
delay commands and miscellaneous commands. References 6 
and 11 should be consulted for a complete listing of all 
Crystal commands and their use. Output from Crystal is 
written on the terminal screen and can be stored in a file 
if the UNIX "script" command is executed before the timing 
analysis is started. 
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2. Combinational Circuits 



a. Performing a Delay Analysis 

Combinational circuits are the easiest circuits 
to analyze using Crystal. First, all input and output pads 
should be labeled using the VLSI circuit editor Caesar. The 
label can be any combination of distinctive ASCII characters 
except space, tab, newline, double quote, comma, semi-colon 
and parenthesis and must not start or end with a number. 

Next a .sim file is created using Mextra with a -o option. 
Only four commands: "inputs", "outputs", "delay" and 

"critical" are necessary to analyze the circuit. The 
commands "inputs" and "outputs" are used to identify the 
input and output signals of the combinational circuit. 

Delay commands are used to tell Crystal when input signals 
change value [Ref. 6]. The form of the delay command is: 

delay (signal name) tr tf 

where tr is the time that the signal will rise to 1 and tf 
is the time that the signal will fall to 0. An example of a 
delay command is: 

delay ain 3 0 

This delay command specifies that the time that ain will 
rise is 3ns and the time that ain will fall is 0ns. This 
means that ain is initially set to 0 and will rise to 1 3ns 
later. If a negative time is used in a delay statement a 
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transition of that signal will not occur after time 0. This 
allows the programmer to have input signals stable at the 
start of the timing analysis. The command "critical” directs 
Crystal to calculate the slowest path in the circuit. 

Two other commands, "check" and "clear", may also 
be useful. The check command performs a static electrical 
check on the circuit. Information about nodes with no 
transistors connected to them, nodes that are not driven, 
nodes that don't drive anything, transistors that are 
permanently forced off, transistors connecting Vdd and GND , 
and transistors that are bidirectional is printed to the 
screen [Ref. 11]. All of this information, except for the 
information on the bidirectional transistors, is not very 
useful in a Macpitts generated circuit. This is because 
when the MacPitts silicon compiler does not use part of an 
organelle in a chip design the unused circuitry is left in 
the design resulting in improperly connected nodes and 
transistors. A bidirectional transistor is a transistor for 
which Crystal cannot determine the direction of signal flow 
within the transistor. To prevent Crystal from calculating 
circuit delays along impossible paths, bidirectional 
transistors must be labeled to show signal directions. (See 
paragraph 3. a. below for directions on how to label 
bidirectional transistors.) 

The command "clear" is used to clear all previous 
delay information and critical calculations from Crystal. 
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Information on inputs and outputs is not affected. When a 
clear command is used new timing calculations can be made 
based on new delay commands for the same circuit. 

Figure 4.1 shows the sequence of commands used 
to perform a timing analysis on a 1-bit combinational adder 
circuit. A check for bidirectional transistors was 
previously performed and none were found in this circuit. 

Line 2 shows the command used to invoke Crystal and lines 6 
and 8 identify the circuit inputs and outputs. The Crystal 
output lines that are enclosed in brackets on lines 5 and 7 
indicate that Crystal has completed execution of the 
previous commands. Crystal outputs a line in brackets after 
the execution of every command. In lines 10, 16 and 19 the 
two input bits, ain and bin, and the input carry bit, cin, 
are set to 1, 0 and 1, respectively, with delay commands. In 
lines 14, 17 and 20 Crystal indicates the number of stages 
that had to be examined to determine the timeing delay for 
each signal. After the delay commands, the critical command 
is given in line 22. Lines 23 through 55 shows the time 
delay through the critical path in the circuit. Each node 
that is in the critical path is identified with the time 
that it is driven. In this case the critical path started at 
input pad bin, goes through the combinational logic in the 
data-path and then ends at the output pad res 198.12ns later 
(see Figure 4.2). 
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Figure 4 



1 ^ script 

2 ^ cr> St al addcl siin 

3 Crystal, v 2 

4 : build addcl sim 

5 l();0().7u 0:00. 2s 30k 

6 : inputs ain bin cin 

7 |0:00 Ou 0 00.0s 39k; 

8 : outputs res 

9 |0:00 Ou 0:00.0s 39k 

10 : delay ain 0 -1 

11 Marking transistor flow... 

12 Setting Vdd to 1... 

13 Setting GND to 0... 

14 (28 stages examined.) 

15 j0:00.2u 0:00.1s 48k] 

16 : delay bin -1 0 

17 (41 stages examined.) 

18 |0:00.1u 0:00.1s 54k| 

19 : delay cin 0 -1 

20 (26 stages examined.) 

21 |0:00.0u 0:00.0s 54k| 

22 : critical 

23 Node res is driven high at 198.12ns 

24 ...through fet at (885, 525) to Vdd after 

25 342 is driven high at 189.31ns 

26 ...through fet at (870, 457) to Vdd after 

27 357 is dri\en low at 179 77ns 

28 ...through fet at (849, 505) to GND after 

29 139 is driven high at 171.36ns 

30 ...through fet at (730, 387) to Vdd after 

31 258 is driven low at 85 70ns 

32 ...through fet at (668, 381) to 233 

33 ...through fet at (668, 376) to GND after 

34 221 is driven high at 81.04ns 

35 ...through fet at (623, 385) to Vdd after 

36 240 is driven low at 72.31ns 

37 ...through fet at (561, 379) to 225 

38 ...through fet at (561, 374) to GND after 

39 171 is driven high at 66.64ns 

40 ...through fet at (454, 387) to Vdd after 

41 255 is driven low at 48.79ns 

42 ...through fet at (392, 381) to 231 

43 ...through fet at (392, 376) to GND after 

44 219 is driven high at 44.13ns 

45 ...through fet at (347, 385) to Vdd after 

46 237 is driven low at 35.35ns 

.1. Crystal Delay Analysis of a 1-Bit 
Combi national Adder 
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47 ...through fet at (285, 379) to 223 

48 ...through fet at (281, 374) to GND after 

49 141 is driven high at 30.53ns 

50 ...through fet at (264, 381) to Vdd after 

51 119 is driven lov\ at 13.17ns 

52 ...through fet at (474, 603) to GND after 

53 401 is driven high at 8.66ns 

54 ...through fet at (518. 593) to Vdd after 

55 bin is driven low at 0.00ns 

56 10:00. lu 0:00 2s 54k 

57 : clear 

58 (O OO.Ou 0:00.0s 54k| 

59 : delay ain -1 0 

60 Marking transistor flow... 

61 Setting Vdd to 1 ... 

62 Setting GND to 0 ... 

63 (26 stages examined.) 

64 |0:00.1u 0:00.1s 60k) 

65 : delay bin 0 -1 

66 (52 stages examined.) 

67 [O.OO.lu 0:00.1s 63k] 

68 : delay cin -1 0 

69 (61 stages examined.) 

70 |0:00.2u 0:00.0s 63k) 

71 : critical 

72 Node res is driven high at 226.63ns 

73 ...through fet at (885, 525) to Vdd after 

74 342 is driven high at 217.82ns 

75 ...through fet at (870, 457) to Vdd after 

76 357 is driven low at 208.28ns 

77 ...through fet at (849, 505) to GND after 

78 139 is driven high at 199.87ns 

79 ...through fet at (730, 387) to Vdd after 

80 258 is driven low at 114.21ns 

81 ...through fet at (668, 381) to 233 

82 ...through fet at (668, 376) to GND after 

83 221 is driven high at 109.55ns 

84 ...through fet at (623, 385) to Vdd after 

85 240 is driven low at 100.82ns 

86 ...through fet at (561, 379) to 225 

87 ...through fet at (561, 374) to GND after 

88 I7l is driven high at 95.15ns 

89 ...through fet at (454, 387) to Vdd after 

90 255 is driven low' at 77.30ns 



Figure 4.1. Crystal Delay Analysis of a 1-Bit 
Combinational Adder (cont.) 
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Figure 



91 ...through fet at (392, 381) to 231 

92 ...through fet at (392, 376) to GND after 

93 219 is driven high at 72.64ns 

94 ...through fet at (347, 385) to Vdd after 

95 237 is driven low at 63.86ns 

96 ...throuKh fet at (285, 379) to 223 

97 ...through fet at (281, 371) io (,\|) after 

98 141 is driven high at 59.04ns 

99 ...through fet at (2G4, 381) to Vdd after 

100 170 is driven low at 42.03ns 

101 ...through fet at (188. 372) to GND after 

102 63 is driven high at 28.83ns 

103 ...through fet at (182. 189) to \’dd after 

104 36 is driven low at 15.52ns 

105 ...through fet at (885. 304) to GND after 

106 148 is driven high at 8 65ns 

107 ...through fet at (875. 408) to \’dd after 

108 cin is driven low at 0.00ns 

109 : quit 



4 . 1 . 



Crystal Delay Analysis of a 1-Bit 
Combinational Adder (cont.) 
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Figure 4.2. 1-Bit Combinational Adder 




In line 57 the clear command is used so that a new 
timing delay analysis on the same circuit can be made. In 
lines 59, 65 and 68 new delay commands set ain, bin and cin 
to 0, 1 and 0 respectively. The critical command is given on 
line 71 and new critical path information is shown on lines 
72 through 108. This time the critical path starts at the 
input pad cin, goes through the Weinberger array and the 
combinational logic in the data-path and ends at the output 
pad res 226.63ns later. After finishing a Crystal timing 
analysis the command "quit" should be used to exit the Crystal 
program. 

As can be seen from the timing analysis of the 
1-bit combinational adder, the longest critical path occurs 
when cin is driven to a low state. This is because the cin 
signal must travel through the Weinberger array and the first 
organelle in the data-path. This circuitry is normally at a 
high state unless brought low by a low cin. A high cin causes 
no level transitions so there is no delay through the 
circuitry. For a low cin there is a low transition that 
takes approximately 30ns to propagate through the Weinberger 
array and the first organelle in the data-path. 

If the -g (filename) option is used with the 
critical command [Ref. 11] the critical path timing information 
is printed in (filename) in a format that can be accessed by 
Caesar using the Caesar "source" command. Each node in the 
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critical path is identified on the Caesar display screen 
along with the timing delay information. Figure 4.3 shows 
an example of how the timing delay information is displayed, 
b. Validation of Crystal's Timing Data 

(1) Introduction . Previous to this research 
effort there had been no experience at the Naval Postgraduate 
School in using Crystal to analyze circuits. The accuracy of 
the results produced by Crystal was not known. In order to 
gain confidence in Crystal a complete timing analysis of the 
1-bit combinational adder previously analyzed by Crystal was 
performed using the Mead-Conway guidelines in [Ref. 12]. 

The critical path found by Crystal was used 
to determine which transistors in the circuit were on. The 
delay calculations are divided into logic delays, wire 
delays and pad delays. 

(2) L ogic Delays . The following equations were 
used to calculate the logic delay in the circuit; 

Tpt = 2t 
Tinv = fkt 
Tnand = 2fkt 

where Tpt is the delay for a pass transistor, Tinv is the 
delay for an inverter, Tnand is the delay for a nand gate, t 
is the signal transit time, f is the gate fanout, and k is 
the pull-up to pull-down transistor ratio. Table I shows 
that the total logic delay, in terms of the signal transit 
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Figure 4.3. Caesar Display of Crystal Generated 
Timing Delay Information 



TABLE I 



RESULTS OF LOGIC DELAY CALCULATIONS 



LOGIC 1 

ELEMENT 1 

1 


k 


i 1 
1 
1 


GATE DELAY 


1 « OF GATES 
1 
1 


TOTAL 

DELAY 


1 

inverter 1 
1 


A 


1 

1 1 
1 


4t 


1 

1 1 
1 


4t 


1 

pass 1 




1 

1 




1 

1 




tr ans i stor 1 
1 


““ 


1 


2t 


1 1 
1 


2t 


1 

nand gate 1 
1 


8 


1 

3 1 
1 


48t 


1 

1 1 
1 


48t 


1 

nand gate 1 
1 


4 


1 

2 1 
1 


16t 


1 

1 3 

1 


48t 


1 

nand gate 1 
1 


4 


1 

1 1 
1 


8t 


1 

1 5 

1 


40 1 



TOTAL LOGIC DELAY = 142t 
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time, is 142t. Reference 12 states that the signal transit 
time equals 0.3ns for a six micron design (lambda equals 3 
microns) and the 1-bit combinational adder is a 4 micron 
design (lambda equals 2 micron). The transit time is scaled 
down by dividing by the scale factor 1.5 (6 microns divided 
by 4 microns). This gives a transit time of 0.2ns. Using 
this value, a logic delay of 28ns is obtained. This value 
is doubled to account for stray capacitance in the circuit 
giving a total logic delay of 56ns. 

(3) Wire Delays . From Figure 4.2 it can be seen 
that there are long metal and polysilicon runs in the circuit. 
The total length of metal runs from the input pad to the 
Weinberger array and from the data-path to the output pad is 
approximately 3.9mm. The total length of polysilicon runs 
from the input pad to the Weinberger array, from the Weinberger 
array to the data-path and from the data-path to the output 
pad is approximately 2.1mm. There are no significant 
diffusion runs in the circuit. 

Reference 12, page 231, states that metal line 
delays equal O.lns/lOmm and that polysilicon line delays 
equal 200. Ons/lOmm. Using these values a wire delay of 42ns 
is calculated. The wire delays used in the above calculations 
are based on a 6 micron design. When lambda is scaled down 
the capacitance per unit length of wire stays constant but 
the resistance scales up quadratically . Since lambda is 
scaled down by a factor of 1.5 the wire resistance scales 
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up by 2.25. Multiplying the above wire delay by 2.25 gives 
a total wire delay of 95ns. 

(4) Pad Delays . The signal delay for the output 
pad is approximately 13ns [Ref. 13]. Due to the lack of 
available information on the signal delay for the input pad 
the delay calculated by Crystal of 8ns will be used in this 
comparison. This gives a total pad delay of 21ns. 

(5) Comparison of Results . In Table II a 
comparison is given of the circuit delays calculated using 
the Mead-Conway methods and those calculated by Crystal. 

The logic delays calculated using the Mead-Conway methods are 
less than that calculated by Crystal because delays caused by 
the polysilicon wires connecting the gates together in the 
data-path are not taken into account in the Mead-Conway 
calculations. The total circuit delay of 172ns calculated 
by the Mead-Conway methods is in close agreement with the 
226.63ns delay calculated by Crystal. It can be concluded 
that the circuit delay information given by Crystal is 
accurate and can be used with confidence. 

3 . Pipeline Circuits 

a. Labeling Bidirectional Transistors 

Before a timing analysis can be done on a MacPitts 
design the bidirectional transistors in the circuit must be 
identified by using the check command of Crystal and properly 
labeled so that Crystal does not have to determine the 
direction of signal flow through these transistors. The 
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TABLE I r 



COMPARISON OF MEAD-CONWAY AND 
CRYSTAI, DELAY CAI.CUI.AT IONS 





1 Mead-Conway 1 

1 1 


Crystal 


logic delay 


1 1 

1 56ns 1 

1 1 


93.79ns 


wire delay 


1 1 

1 95ns 1 

1 1 


105. 84ns 


pad delay 


1 1 

1 21ns 1 

1 1 


27ns 


total delay 


1 1 

1 172ns 1 

1 1 


22^. 63ns 
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Macpitts data storage elements, the register and the flag, 
each have bidirectional transistors in them. The register 
has five bidirectional transistors in each register cell and 
the flag has one. 

The procedure used to show the direction of 
signal flow through a bidirectional transistor is to attach a 
transistor attribute label to the transistor using Caesar. A 
transistor attribute label has the following form: 

Cr : ( label )$ 

The label must be placed exactly in the middle of the source 
or drain edge of the gate region of the transistor. This is 
done by placing the center of the Caesar bounding box over 
the center of the source or drain edge of the gate and 
typing the following Caesar command: 

: la Cr:(label)$ center 

Figure 4.4 shows a stipple plot of a bidirectional 
transistor. The center of the bounding box is on the center 
of the source edge of the gate region and the transistor 
attribute label Cr:A$ has been affixed to this point. 

If a bidirectional transisotr is not electrically 
connected to any other bidirectional transistor the transistor 
attribute label should be placed on the source edge of the 
gate. If two or more bidirectional transistors are 
electrically connected the same attribute label should be 



78 




Figure 4.4. Placement of a Transistor Attribute 
Label on a Bidirectional Transistor 
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placed on each transistor on the side of the gate that 
shares the electrical connection to the other transistors. 
Bidirectional transistors that are not electrically 
connected should have different labels. 

Figure 4.5 shows the stipple plot of a register 
cell that has five bidirectional transistors labeled with 
transistor attributes. The bidirectional transistor labeled 
Cr:A$ is not electrically connected to any other bidirectional 
transistor. The source side of the gate has been labeled. 

The two transistors labeled Cr:B$ are the pull-up and pull- 
down transistors of an inverter. Due to the unusual MacPitts 
inverter structure Crystal could not determine the direction 
of signal flow and identified the pull-up and pull-down 
transistors as bidirectional. Since both of the transistors 
are electrically connected the same transistor attribute 
label has been placed on the side of the gates that are 
connected. Transistors labeled Cr:C$ are the pull-up and 
pull-down transistors of another inverter. Figure 4.6 shows 
the transistor attribute labeling for the one bidirectional 
transistor in a flag. 

b. Crystal Commands for Clocked Circuits 

(1) Problems Analyzing a MacPitts Design . 

Crystal was designed to be used for a non-overlapping 
clocking scheme. The overlapping clock phases and the five 
segment period of the MacPitts clock (see Figure 2.2) make 
the MacPitts pipeline adder circuit much more difficult to 



analyze . 
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Figure 4.5. Transistor Attribute Labels for a Register Cell 






Figure 4.6. Transistor Attribute Label for a Flag 
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When Crystal does a timing analysis of a 
clocked circuit it is assumed that each clock phase (or 
clock period segment in the case of a MacPitts design) is 
long enough for the combinational logic in the circuit to 
settle. But in a MacPitts circuit the first and second 
clock period segments, tl and t2 , are used for the settling 
time of the combinational logic. Crystal will give an 
overly long delay for tl of a MacPitts design because all of 
the logic propagation delay will be assigned to this section. 

Another problem is that it will not be 
possible to determine the logic delay of any stage in the 
pipeline if the delay of the clock phase signals phia, phib 
and phic getting to the registers or flags is longer than 
the stage logic delays. This is because Crystal only gives 
the timing delay for the critical or longest path in the 
circuit . 

The problems are solved by dividing the 
timing analysis of the Macpitts pipeline design into two 
parts. First the clocked registers and flags of the chip 
are analyzed for the timing delay of the input clock phase 
signals and then the combinational logic in each pipeline 
stage is analyzed to determine the slowest stage in the 
pipeline system. 

(2) Register and Flag Delays . The first step 
in performing a timing analysis of the clocked registers and 
flags is to edit the MacPitts circuit using Caesar. All 



83 



logic except the flags block, registers and the ground, 
power and clock pads is deleted from the circuit. This is 
done so that Crystal does not use the extraneous circuitry 
in determining the critical path through the registers and 
flags. Next, the registers are deleted from the circuit 
because the clock phase signals will take longer to reach 
the flags than the registers. This is because the registers 
are closer to the clock pads on the clock bus and also the 
clock phase signals are further delayed in the flag block 
by two inverters. Finally, the input and output lines of 
each flag are disconnected from the extraneous data lines 
going to the Weinberger array, if not already done so, and 
the input and output wires of each flag are labeled (see 
Figure 4.7). Figure 4.8 shows what the edited circuit 
looks like for 4-bit 5-stage pipeline adder. 

The timing analysis of a clocked circuit 
is similar to that of a combinational circuit except that 
there is a separate set of delay and critical commands for 
each clock phase. For the MacPitts overlapping clock there 
is a separate set of delay and critical commands for each of 
the five segments of the clock period. The clear command is 
used between each set of delay and critical commands. Prior 
to the delay commands, the clock phases that do not change 
state during a section of the clock period should be set to 
the high or low state using the set command [Ref. 11]. Inputs 
that are set to a state are not used by Crystal to determine 
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Figure 4.7. Disconnecting the Flag Input and Output 
Lines 
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Figure 4.8. 4-Bit 5-Stage Pipeline Adder Edited for Crystal Analysis 




the critical path because they do not have a state 
transition. Also, if delay commands for inputs other than 
the clock phases are not used Crystal assumes that the 
input signals stabilize long before the start of the clock 
period. Crystal then determines the longest critical 
path in the circuit no matter what the state of the 
non-delayed inputs are. In Figure 4.9 the Crystal commands 
used to analyze the clock phase delays through the flags 
block are listed. 

(3) Pipeline Stage Delays . A separate Crystal 
timing analysis must be performed on the combinational logic 
in each pipeline stage in order to obtain propagation delays 
for each stage. First, the input and output signals of each 
stage must be determined. Input signals come from input pads 
or from register or flag outputs. Output signals are inputs 
to registers, flags or output pads. Next, using Caesar, the 
input and output lines of each stage are disconnected from 
any logic elements that are not part of that stage. This is 
done so that Crystal does not use circuitry that is not part 
of a stage in determining the critical path through that 
stage. Labels are then placed on all input and output lines. 

Figures 4.10 and 4.11 show two different 
circuits before they are edited using the above procedure 
and Figures 4.12 and 4.13 show the circuits after they have 
been edited. In Figure 4.12 node cl is the output line of 
stage 1 of the pipeline and has been disconnected from the 
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V. scr i p t 

V. crystal addp4.sjm 
: inputs in<16:l> ph i a phib phic 

; outputs out<16:l> 

: set 1 ph i a phic 
: delay phib 0 -1 
; critical 
; clear 
: set 1 ph i a 
: delay phib -1 0 
: delay phic -1 0 
: critical 
; clear 

: set 0 phib phic 
: delay ph i a -1 0 
: critical 
! clear 

: set 0 phib phic 
; delay ph i a 0 -1 
: critical 
! clear 
: set 1 ph i a 
: set 0 phib 
: delay phic 0 -1 
: critical 
: quit 



Figure 4.9. Crystal Commands: Timing Delay of 

Clock Phases 
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Figure 4.10. A Register Cell Before Editing 
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Figure 4.11. Flag Cells Before Editing 
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Figure 4.12. A Register Cell After Editing 
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Figure 4.13. Flag Cells After Editing 





input of the storage register cell. Node dl is the input line 
of stage 2 and has been disconnected from the output of the 
register cell. In Figure 4.13 nodes ol and pi are output lines 
of stage 4 and have been disconnected from the input lines of 
the storage flags. Nodes o2 and p2 are inputs of stage 5 and 
have been disconnected from the output lines of the flags. 

After all stages have been isolated and input 
and output lines labeled a .cif file is created using Caesar 
and then a .sim file is created using Mextra. A Crystal timing 
analysis is then performed on each stage in the pipeline 
using the same procedure as used when performing an analysis 
on a combinational logic circuit. 

B. DESIGN COMPARISONS 

Three important parameters used when comparing the 
performance of integrated circuit designs are chip size, 
power and speed. 

In order to determine the speed of a MacPitts pipeline 
design the logic delay in each stage and the clock phase 
delays must be compared. The propagation time of the slowest 
stage in the pipeline is compared to the sum of the first 
two segments of the clock period tl and t2. This is because 
all logic propagation in the circuit must be settled before 
t3 when the inputs to all storage registers and flags are 
sampled. The slowest of these times is then added to t3, t4 
and t5 to determine the clock period. 
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Table III shows the propagation delay for each stage of 
a 4-bit pipeline adder and an 8-bit pipeline adder (4 micron 
designs). The long delays in stages 2 through 5 of each 
adder are caused by long delays through the Weinberger array 
and the long high-resistance polysilicon runs carrying data 
from the registers to the array and carrying data back and 
forth from the flags block to the array. The delays through 
the Weinberger array are due to three factors. First, the 
inputs to the array from the registers and flags are driven 
by k=4 inverters. These inverters, which are not super 
buffered, drive up to five nor gates in the array thus 
adding substantial delay to the stage [Ref. 12] . This delay 
could be considerably reduced if the outputs of all 
registers and flags were super buffered. Second, the 
propagation delay in the array is high due to the large 
number of nested NOR gates in the array. In some cases up to 
five NOR gates are nested to perform a particular function 
(i.e. an XOR function). This is much more delay than would 
be found in the two level nesting of a PLA. The excessive 
delays in the array are also caused by the long polysilicon 
lines that connect the inputs and outputs of the NOR gates. 

In some cases an output of a NOR gate is connected to the 
input of another NOR gate by a polysilicon wire that runs 
nearly the total width of the array. The increase in stage 
propagation delay of the 8-bit adder when compared to the 
4-bit adder is due to the increased size of the Weinberger 
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TABLE III 



PIPELINE STAGE DELAY 



STAGE 1 
1 


4-BIT PIPELINE ADDER 


1 8-BIT PIPELINE ADDER 
1 


1 

1 1 
1 


33 . 59ns 


1 

1 51 . 87ns 

1 


1 

2 1 
1 


126.14ns 


1 

1 255.53ns 

1 


1 

3 1 

1 


1 06 . 60ns 


1 

1 222.89ns 

1 


1 

4 1 

1 


142.70ns 


1 

1 250.63ns 

1 


1 

5 1 

1 


141. 63ns 


1 

1 203.87ns 

1 
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array of the 8-bit adder and not due to a poorly designed 
pipeline chip. 

In Table IV the delay in each clock period segment is 
listed. The long delays are due to the input clock pads not 
being super buffered. One k=4 inverter on each clock pad 
must drive eight k=4 inverters; one inverter for each of the 
seven registers and one input inverter to the flags block. 
Each of the input inverters of the registers and flag block 
cause further delay because they are not super buffered but 
must drive many register cells and flags. In the case of the 
8-bit pipeline adder one k=4 inverter must drive twenty- 
seven flags. Additional delay is caused by the long clock 
bus. The clock signals must traverse a length nearly equal 
to the height and width of the chip before reaching the 
flags block. If the clock input pads, the input inverters, 
all registers and the flags block were super buffered the 
timing delay of each clock period segment would be 
substantially improved. 

Comparing Tables III and IV it can be seen that the 
propagation delays through clock period segments tl and t2 
are greater than the slowest stage for both the 4-bit and 
8-bit pipeline adders. Thus, the clock period is found by 
adding tl through t5. The clock period of the 4-bit 5-stage 
pipeline adder is 486.74ns (2.055 MHz clock) and the clock 
period of the 8-bit 5-stage pipeline adder is 706.32ns 
(1.415 MHz clock). 
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TABLE IV 



CLOCK SIGNAL DELAY 



CLOCK PERIOD 1 
SEGMENT 1 

1 


4-BIT PIPELINE 1 

ADDER 1 

1 


8-BIT PIPELINE 
ADDER 


1 

tl 1 

1 


1 

116.00ns 1 

1 


1 70 . 46ns 


1 

t2 1 

1 


1 

66.62ns 1 

1 


102.66ns 


1 

t3 1 

1 


1 

82.93ns 1 

1 


106. 96ns 


1 

t4 1 

1 


1 

100.87ns 1 

1 


1 53 . 56ns 


1 

t5 1 

1 


1 

120.05ns 1 

1 


172.68ns 
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Table V lists the chip size, power and speed of several 
4 micron combinational and pipeline adder circuits designed 
by MacPitts. In addition, a 16-bit 4-stage pipeline adder 
that was designed by hand is also listed [Ref. 5]. (See 
Figure 4.14. ) 

Chip size and worst case static power consumption are 
standard outputs from the MacPitts silicon compiler. The 
required power for the hand designed adder was found by 
using a program called powest that makes an estimate of the 
DC power required in a circuit based on the number of 
enhancement and depletion mode transistors in the circuit. 
Powest uses a .sim file as input and an output of the 
average DC power (based on one-half of the transistors being 
on at any time) and the maximum DC power (based on all 
transistors being on) is printed on the terminal screen. The 
value of power listed in Table V for the hand designed adder 
is the maximum DC power. The command to run powest is: 

powest -p ^ filename. sim 

For comparison, powest was run on all of the MacPitts 
designs and the power estimates calculated by powest and 
MacPitts were, on the average within 10% of each other. 

All chip speed values listed in Table V were calculated 
by Crystal. Reference 5 estimates the clock speed of the 
16-bit 4-stage pipeline adder as 8 MHz. This is seven times 
faster than the 1.141 MHz calculated by Crystal. The reason 
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PERFORMANCE COMPARISONS OF DESIGNS 
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Figure 4.14. Hand Designed 16-Bit 4-Stage Pipeline 
Adder 
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for the discrepancy is that reference 5 does not take in 
account that the clock pads in the circuit, which are not 
super buffered, must drive a large number of pass transistors. 
Clock pad phia drives 138 pass transistors that clock data 
into the five PLAs in the circuit while clock pad phib drives 
121 pass transistors that clock data out of the PLAs. 

Another interesting observation about the hand designed 
circuit from reference 5 is that when the circuit is examined 
using Caesar a misalignment of one-half lambda between the 
data, power and ground buses going into the PLAs and the PLA 
blocks is found. As seen in Figure 4.15, the bus 
misalignments are not enough to disconnect any wires. 

As expected, when the combinational adder circuits were 
converted to pipeline circuits the chip size and power 
increased, but the increase in chip throughput (or speed) 
anticipated in a pipeline design did not occur. The slow 
circuitry of the Weinberger array, non-super buffered clock 
pads and long polysilicon runs in the MacPitts pipeline 
circuits caused excessive delays and decreased performance 
below that of the combinational circuits . The excessive 
delays could be reduced if the Weinberger array was redesigned 
to reduce the NOR gate nesting or replaced by a PLA, if all 
input lines to the array were super buffered and if the long 
polysilison runs were replaced with metal or diffusion runs. 

If the design of a 16-bit pipeline adder were possible it is 
expected that this design would have a clock speed less than 
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Pipeline Adder Circuit 





and chip area much larger than the hand designed adder. Even 
with fast logic in each pipeline stage and super buffered 
clocks the fact that the last three segments of the MacPitts 
clock period cannot be used for logic propagation insures 
that the MacPitts pipeline designs will be slower than any 
well designed hand-crafted circuit. 
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V. MACPITTS DESIGN ERRORS 



A. INTRODUCTION 

Although the MacPitts silicon compiler is expected to 
generate error free designs, several cases have been found 
where design errors have been made. These design errors 
fall into two categories: wiring errors and alignment 

errors. Wiring errors have occurred when wires become 
electrically connected when they should not be and alignment 
errors have occurred when circuitry has been placed 
incorrectly on the chip so that it does not align properly 
with adjacent circuitry. 

B. WIRING ERRORS 

1 . Description of Errors 

A case of a fatal wiring error was discovered where 
the MacPitts compiler electrically connected all three clock 
lines that run in the clock bus below the data-path to a 
data line that was running from the data-path to the 
Weinberger array. This error was found to occur whenever the 
last organelle of the data-path or sequencer is the 
organelle used by the compiler to transfer data from the 
data-path to the Weinberger array (see Figure 5.1). The 
vertical polysilicon data wire of this organelle runs 
parallel and only four lambda away from a large ground bus 
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Figure 5.1. 



MacPitts Wiring Error 
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line that is always placed on the right edge of the 
data-path and sequencer. The horizontal clock bus must make 
metal-to-polysilicon polysilicon-to-metal vias over this 
ground bus. Since the data wire runs so close to the ground 
bus it crosses the clock bus at the metal-to-polysilicon via 
and becomes electrically connected to the clock lines (see 
Figure 5.2). This error was also found by Kelly (as 
mentioned in [Ref. 4]) when he used MacPitts to produce a 
butterfly switching element chip at MIT Lincoln Laboratory. 
Unfortunately, this error cannot be identified when a design 
rule check is made on the circuit because no design rules 
are violated. 

It is not difficult to predict when this wiring error 
is going to occur in the data-path and to correct it when it 
is found. A programmer should first examine the MacPitts 
.mac program to identify all statements that cause word size 
operations to be performed and cause the compiler to produce 
an organelle in the data-path. If the last word size 
statement in the .mac program uses the "bit" data-path 
function of the form: 

(bit <[ bit-posit ion> < integer-expression )> ) 

the organelle that transfers data from the data-path to the 
Weinberger array will be placed on the right edge of the 
data-path and a fatal wiring error will occur. (See [Ref. 2] 
for a description of the bit function.) 
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Figure 5.2. Close-up of Wiring Error 
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It is more difficult to predict when this wiring 
error is going to occur in the sequencer than in the data- 
path. Reference 8 contains details of sequencer wiring 
errors . 

2 . Correction of Wiring Errors 

The wiring errors in the data-path and sequencer can 
be easily corrected using the Caesar VLSI circuit editor. 

The Caesar file that contains the last organelle of the 
data-path or the sequencer must first be identified. This 
file is then edited using Caesar and the right one or two 
data lines are rerouted around the clock bus via as shown in 
Figure 5.3. 

If it has been determined that the "bif' function is 
the last work size statement in the .mac program the steps 
used in the MacPitts design cycle of a 5 micron design that 
are listed on page 68 of reference 4 should be modified as 
follows ; 

1. Generate a 5 micron .cif file as stated. The following 
command will create several Caesar files each containing 
a description of part of the design. (Ignore user 
extension warning). 

% cif2ca -1 250 filename. cif 

2. Rename the top level Caesar file. 

% mv project. ca filename. ca 

3. Use Caesar to identify the Caesar file, symbol xx.ca, 
that has the wiring in it. 
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Figure 5.3. Correction of Wiring Error 
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1o Caesar filename 



The Caesar file for the complete data-path/sequencer 
may have to be edited in Caesar to identify the file 
that contains the last organelle fo the data-path/ 
sequencer where the wiring error is located. Caesar 
can be used to reroute the data lines around the clock 
bus via. 

4. Edit the top level Caesar file again and create a new 
.cif file. 

: sa 

: cif 248 

: q 

5. Next, perform a design rule check of the new .cif file. 
(Note that the cif command line ends in -qnq not -gng 
in the following command). 

% cif filename. cif -qnq 

% cll filename. CO 

% drc filename. SCO 

6. To perform an event simulation on the modified 5 micron 
design the procedure listed on page 71 of reference 4 
for the 4 micron design should be followed to affix 
labels to the bonding pads, obtain a node extract, 

and start the simulation run. Insure that the 248 
scale is used when creating a new .cif file of a 5 
micron design in Caesar (see page 96 of reference 4). 

For a micron design that contains wiring errors 
the MacPitts design cycle listed on page 70 of reference 4 
should be followed. The wiring errors can be corrected, 
using the above procedure, at the same time that the labels 
are affixed to the bonding pads. 
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C. ALIGNMENT ERRORS 



Alignment errors have been found in the flags block and 
the Weinberger array of several different designs. Most of 
the alignment errors that were found were identified by 
performing a design rule check on the circuit that contained 
the errors. The design rule check program is able to find 
the errors because in most cases metal-to-metal , 
polysilicon-to-polysilicon or diffusion-to diffusion 
separation errors occur. 

In the flags block the errors have occurred when the 
compiler places the flags block on the chip so that the 
internal clock, ground and data buses of the block do not 
properly align with the chip clock, ground and data buses. 
The misalignment of the flags block has been found in three 
designs; the 4 micron 4-bit 5-stage pipeline adder and both 
the 4 micron and 5 micron 8-bit 5-stage pipeline adders. In 
each case the circuitry inside the flags block has been 
designed correctly but the block itself has been placed 
incorrectly on the chip. 

In the case of the 4 micron 4-bit 5-stage pipeline 
adder the flags block was placed two lambda too high in the 
circuit. Figure 5.4 shows that the flags block ground bus 
does not properly connect with the chip ground bus. In 
Figure 5.5 the metal-polysilicon contacts of the flags block 
clock lines do not properly align with the metal lines of 
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the chip clock bus. The flags block of the 5 micron 8-bit 
5-stage pipeline adder is placed two lambda too high and 
one-half lambda too far left on the chip. Figure 5.6 shows 
the two lambda misalignment of the flags block clock lines 
and a one-half lambda misalignment of the flags block data 
lines. The flags block data lines have a two lambda overlap 
with the chip data lines so even with a two lambda vertical 
flags block misalignment the data lines are still 
electrically connected. A flags block misalignment of eight 
lambda in the vertical direction was found in the 4 micron 
8-bit 5-stage pipeline adder. Figure 5.7 shows the clock 
and data bus alignment errors for this circuit. 

The Weinberger array alignment errors are more complex 
than the flags block errors. In addition to errors where 
the Weinberger array is placed incorrectly on the chip there 
are also some internal alignment errors in the array. Figure 
5.8 shows three misalignments of the Weinberger array buses 
and the chip buses. Also shown is one internal misalignment 
where a diffusion line is not properly connected to a pull-up 
transistor. Weinberger array alignment errors will be 
treated in detail in reference 8. 

The cause of alignment errors is not yet understood. 
Alignment errors have only been found in MacPitts designs 
since the Macpitts compiler was installed under the UNIX 4.2 
operating system. No alignment errors were found when the 
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Figure 5.5. Clock Bus Misalignment Error: 4 Micron 4-Bit 

5-Stage Pipeline Adder 
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FiKuro 5.6. Clock and Data Bus Misalignment Errors: 
8-Bit Stage Pipeline Adder 
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Figure 5.7. Clock and Data Bus Misalignment Errors: 4 Micron 

8-Bit 5-Stage Pipeline Adder 




\ 


Up 




1 








i 








Figure 5,8. Weinberger Array Alignment Errors 
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compiler was installed under the UNIX 4.1 operating system. 
It is thought that the version of the Franz LISP compiler 
installed under UNIX 4.2 may be causing an unexpected 
roundoff or truncation when the compiler calculates the 
vertical and horizontal coordinates used to place circuitry 
on the chip. Alignment errors can be corrected by using 
Caesar . 
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VI. CONCLUSION 



A. SUMMARY 

The objectives of this thesis were to determine what 
the basic circuits in MacPitts designs are and how they are 
used, to make performance comparisons of several different 
adder designs with a hand-crafted adder design and to obtain 
a better understanding of the MacPitts interpreter. 

The basic building blocks that the MacPitts compiler 
uses in circuits were found to be the data-path, the 
sequencer, the flags block and the Weinberger array. The 
circuit density and speed of the building blocks were found 
to be low. This was expected since Siskind was quoted in 
reference 14 as stating that optimizing chip performance was 
not a primary MacPitts design goal. The functional 
description of the circuit in the .mac program was found to 
have a direct relationship to the circuit structures that 
the compiler used to design the circuit. 

It was found that circuits designed by the MacPitts 
silicon compiler are very inefficient in terms of the amount 
of circuitry per chip area and that the speed of a MacPitts 
circuit is slow compared to hand-crafted designs. The 
significant advantage that MacPitts-designed circuits have 
over hand-crafted circuits is the reduction in time required 
to design the circuit. This makes silicon compilers an 
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attractive alternative when the time required to design a 
circuit is a more important consideration than the speed or 
size of the circuit. Until the cause of the alignment errors 
discussed in Chapter V is found and corrected, all MacPitts 
designs must be inspected carefully for the possibility of 
alignment errors. Unexpectedly, it was also found that 
combinational adder circuits were faster than pipeline adder 
circuits because of the MacPitts clocking scheme and the 
timing delay caused by the non-super buffered clock lines 
driving the registers and flags. 

Appendix A gives a complete list of all the MacPitts 
interpreter commands and an explanation of their use. In 
addition, all interpreter error statements and their 
definitions are listed. 

In 1983 the developers of the MacPitts silicon compiler 
(Siskind, Southard, and Crouch [Ref. 3]) left MIT Lincoln 
Laboratory and formed their own company, MataLogic, Inc., to 
produce a commercial silicon compiler. MetaLogic's current 
compiler, called MataSyn, is a redesigned version of the 
MacPitts compiler. Most of the design limitations of 
MacPitts have been eliminated in MetaSyn. Two of the more 
significant improvements in MetaSyn are the redesign of the 
interpreter and the Weinberger array. The new interpreter, 
now called the simulator, is very flexible and user friendly 
and has few of the limitations of the MacPitts interpreter 
listed in Appendix A. The Weinberger array has been 
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redesigned to improve circuit speed. A PLA is now used for 



all chip 
only for 
MacPit ts 
#6, have 



control functions and the Weinberger array is used 
bit sized boolean functions. The recommended 
improvements listed below, except for #2, ir5 and 
been incorporated in MetaSyn. 



B. RECOMMENDATIONS 

The following recommendations should be considered to 
improve the MacPitts Silicon Compiler: 

1. Add super buffers to all input pads. 

2. Add super buffers to all data lines leaving the data- 

path, sequencer and flags block, and to all clock 
lines driving the registers and flags. 

3. Redesign the design from to allow pads on all sides. 

4. Use channel routing instead of river routing to reduce 
the complexity of the Weinberger array. 

5. Implement a faster algorithm for design of the 
Weinberger array. 

6. Redesign the registers and flags so that a more 

conventional two-phase clock can be used in MacPitts 

designs. This will eliminate the circuit delay of the 
last three segments of the MacPitts clock that can not 
be used for logic propagation. 

7. Redesign the interpreter to make it more user friendly 
and able to handle large designs containing many flags, 
ports, signals, registers and processes as discussed 

in Appendix A. 

8. As mentioned in Chapter III, a data-path organelle 
should be designed to set and shift data bits of a 
data word so that data can be transferred from the 
Weinberger array to the data-path. 
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9. 



Redesign the data-path so that data can enter or leave 
the data-path from either the left or right side to 
reduce the length of wire runs from the pads. 

10. Redesign the flags block and the data-path organelles 
to save wasted space illustrated in Chapter II. 
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APPENDIX A 



THE MACPITTS INTERPRETER 
A. USE OF THE INTERPRETER 

The MacPitts interpreter is used to test for syntax and 
logical errors in the .mac file. The interpreter creates a 
functional environment of the integrated circuit from the 
.mac file without actually designing the circuit. This 
functional environment can then be simulated. 

The interpreter can be invoked by using the following 
command : 

% macpitts filename int herald 

Filename is the filename of the .mac file without the .mac 
extension. Herald is used so that as the MacPitts silicon 
compiler reaches a milestone as it is processing the .mac 
file, messages are printed to the terminal. Although the 
herald statement can be omitted the milestone messages 
assure the programmer that the silicon compiler is still 
processing the .mac file on long compile runs. 

When the interpreter is ready to start processing a 
simulation run all registers, ports, processes, flags and 
signals defined in the .mac file are listed in a table 
on the terminal screen along with their values (see 
Figure A.l). The first thirty-six items displayed in the 
table are labeled from 0-9 and a-z . The MacPitts 
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REGISTERS 
1; aO = undefined 
2; al = undefined 
3; sto = undefined 
4; w2 = undefined 

PORTS 

8; ain = undefined 
9; bin = tr i -state 
a: res = undefined 



FLAGS 

5; ql = undefined 
6; rl = undefined 
7: carry = undefined 

SIGNALS 

b: reset = undefined 
c! cin = undefined 
d; cout = undefined 



PROCESSES 

e: countup = (undefined) 
f: countdown = (undefined) 



Ready 



Va 1 ue 



Figure A.l. 



The Interpreter Screen Display 
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interpreter does not have the ability to label more than 
thirty-six items so items thirty-seven and higher are not 
labeled. At the bottom of the screen a command line is 
displayed. The command line shows the status of the 
interpreter at any time. Possible command line displays are 
Ready, indicating that the interpreter is ready to accept a 
command, and Clocking, indicating that the interpreter is 
performing a functional simulation of the chip through one 
or more clock cycles. On the bottom right of the screen the 
contents of a special interpreter register called "value" 
are shown. The value register is used to set ports and 
registers to particular values and also indicates the number 
of clock cycles a simulation run will execute. 

There is one serious limitation with the interpreter 
that causes it to be unusable for many large chip designs. 

If the total number of registers, ports and processes 
defined in the .mac file is greater than twenty there will 
be too many items for the interpreter to display on the 
right side of the terminal screen at once (see Figure A.l). 
Also, if the total number of flags and signals is greater 
than twenty-two there will be too many items for the 
interpreter to display on the left side of the terminal 
screen at once. Unfortunately, the interpreter continues to 
try to display those items that will not fit on the screen. 
Since the interpreter is never able to display all items 
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control of the terminal is never turned over to the 
programmer for a simulation run. The only way to stop the 
interpreter if this happens is to abort the run by typing a 
control Z. 

The interpreter uses information from three different 
locations to determine the values of all registers, ports, 
signals, flags and processes during a simulation run. The 
first location is the "console" where the programmer, using 
the terminal keyboard, can specify the values of the above 
items. The second location is the functional environment of 
the circuit, called the "chip". This is where the 
interpreter uses input information from the programmer to 
determine the values of the above items. The last location 
is called the "environment" and is a programmer specified 
functional environment that the programmer may have the 
interpreter use during simulation (see the "e" command 
below) . 

B. INTERPRETER COMMANDS 

All interpreter commands are screen oriented which 
means the command is executed as soon as the key is pressed 
and a carriage return is not necessary. Table VI gives a 
list of the interpreter commands. These commands can be 
displayed on the screen by typing "?". 

Most of the interpreter commands are self-explanatory 
but several require additional explanation. Several commands 
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TABLE VI 



MACPITTS INTERPRETER COMMAND SUMMARY 



P 

9 

e 

c 

1 

q 

j 

k 

G <tag> 
t 
i 
s 
u 

T 

X 

<d i g i t > 



Th i s menu 
Repaint screen 

Put interpreter state to <f i 1 e -name > . i n t 
Get interpreter state from < f i 1 e-name >. i n t 
EnabI e/Oi sabI e environment from <f i I e-name> .env 
Clock system <value> cycle(s) 

Escape to Lisp system 
Qu i t 

Move cursor down 
Move cursor up 
Move cursor to <tag> 

Set flag, input signal, or i/o signal to t 
Set flag, input signal, or i/o signal to f 
Set register, input port, or i/o port to <value> 
Set register, flag, input port, i/o port, 
input signal, or i/o signal to undefined 
Set i/o port or i/o signal to tri-state 
Clear <value> register to 0 
Negate <value> register 
Enter <digit> into <value> register 
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affect only one item on the screen. Before these commands 
can be used the item to be affected must be highlighted by 
the inverse cursor. The "j", "k" and "G" commands are used 
to move the cursor around the screen. If an admSa terminal 
is used instead of a vtlOO terminal the inverse cursor is 
not displayed and only the "G" command can be used to place 
the "invisible” cursor over the item to be affected. 

When the registers, ports, processes, flags and signals 
are initially displayed on the screen by the interpreter 
their values are undefined or tri-state if a tri-state port 
or signal is defined in the .mac file. (See Reference 2 for 
an explanation of the different register, port and signal 
types.) Before a simulation run is made all input and i/o 
ports and signals must be set to some initial value. The "t" 
and "f" commands are used to set input or i/o signals to 
true or false, respectively. The "s" command is used to set 
an input or i/o port to the value stored in the value 
register. Another command, the ”T" command, can also affect 
the values of input or i/o ports and signals but has proven 
to be not very useful. If the "T" command is used on an 
input port or signal, or an i/o port or signal that is used 
for input only in the .mac program, the port or signal value 
will be set to a high impedance state (tri-state). The port 
or signal value will stay at high impedance until explicitly 
set to some value by the programmer using the "s", "t", or 
"f" commands. If an i/o port or signal is used for output 
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only or both input and output in the .mac program the ”T" 
command will cause the port or signal value to change to 
undefined. 

The "c" command is used to simulate the functional 
environment of the chip. The number of clock cycles 
simulated in one simulation run is indicated by the value 
register. If 0 or 1 is stored in the value register only 
one clock cycle will be simulated. 

After a simulation run it may be desirable to store the 
values of all items displayed on the screen. This can be 
done by using the "p" command. The state of the functional 
environment is saved in a file called filename. int where 
filename is the same as the filename. mac file. If more than 
one state is to be saved the programmer must login on 
another terminal and rename the .int file after each state 
is saved because each new state will be saved in the same 
•int file. 

The programmer also has the option of specifying the 
functional environment that the interpreter will use to 
simulate a particular .mac file [Ref. 2], The "e" command is 
used to enable/disable a functional environment stored in 
the filename. env file. There is no published information or 
documentation on the format of the functional environment 
in the .env file so this option has never been used at the 
Naval Postgraduate School. 
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C. INTERPRETER ERROR STATEMENTS 

As stated before, the purpose of performing an 
interpreter simulation is to check for syntax and logical 
errors in the .mac file before a full chip design is made by 
the MacPitts compiler. Logical errors can be found by 
performing a simulation run and then comparing the results 
obtained to those expected. Reference 4, pages 47-49, shows 
a good example on how to perform a simulation of a .mac file 
using the MacPitts interpreter. 

Syntax errors in the .mac file are indicated in one of 
two ways by the compiler. First, if the error is severe 
enough the compiler stops the creation of the functional 
environment and displays an error message on the terminal 
screen that will give an indication of the syntax error. The 
compiler then returns the UNIX operating system back to 
the programmer. An example of a severe syntax error is an 
unequal number of open and closed parentheses in the .mac 
file. Less severe syntax errors usually do not show up 
until initial values are loaded into the input or i/o ports 
and signals or until a simulation run is performed. A short 
error message is then displayed on the command line of the 
terminal screen. 

There are over thirty different error messages that the 
compiler can display when a syntax error is found. The error 
messages and their meaning are as follows: 



130 



1 . 



Interpreter error 1: the interpreter tried to change 

the state (value) of a register but found the current 
state to be empty (null), possessing no value. This 
error indicates an improper register definition or 
usage in the .mac file. 

2. Interpreter error 2: same as 1 above but for a flag. 

3. Interpreter error 3; same as 1 above but for a port. 

4. Interpreter error 4; same as 1 above but for a signal. 

5. Interpreter error 5: Unrecognizable function. Examples 

of some expected functions are setq, not, bit, call 

and if. See reference 2 for a listing of all MacPitts 
f unct ions . 

6. Interpreter error 6: the antecedent of an if statement 

is not t , f or undefined as required. 

7. Interpreter error 7; the interpreter tried to 

determine the state (value) of a register but found the 
current state to be empty (null), possessing no value. 
This error indicates an improper register definition 

or usage in the .mac file. 

8. Interpreter error 8; same as 7 above but for a flag. 

9. Interpreter error 9: same as 7 above but for a process. 

10. Interpreter error 10: same as error 7 above but for 

a port. 

11. Interpreter error 11: same as error 7 but for a 

signal . 
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12. Unrecognizable atomic form: an unknown alpha-numeric 

string is in the .mac file. Check for a missing 
definition or misspelled word. 

13. Process state out of bounds: the state (value) of 

a process is less than zero. Check the .mac file for 
a statement improperly setting a process to a value 
less than zero. 

14. This process has too many returns: a return from a 

subroutine was encountered for which there was no 
previous call statement. Check the .mac file for the 
correct number of returns or for a missing call 
statement . 

15. This process has too many calls: a call to a 

subroutine was made but no return statement was found. 
Check .mac file for correct number of calls or for a 
missing return statement. 

16. Invalid bit selector: the bit selector in the data- 

path function "bit" is not between 0 and the bit size 
of the data-path as required. 

17. Too many arguments: all MacPitts functions require 

only one or two arguments. Check the .mac file and 
Reference 2. 

18. Too few arguments: see 17 above. 

19. A reset signal is needed: a reset signal has not been 

defined when the "process" form is used in the .mac 

f ile . 
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20. Double signal (port) setq, chip vs. environment: 

the interpreter attempts to set a signal (port) to a 
value different than that assigned to that signal (port) 
by the functional environment from the .env file. 

21. Double signal (port) setq, chip vs. console: the 

interpreter attempts to set a signal (port) to a value 
different than that assigned to that signal (port) by 
the programmer using the "s", ”t" or "f" commands. 

22. Double signal (port) setq, environment vs. chip: 
the reverse of 20 above. 

23. Double register setq: two different setq statements 

in the .mac file attempt to assign a value to the 
same register at the same time. 

24. Double process setq: same as 23 above but for a 

process . 

25. Double port setq: same as 23 above but for a port. 

26. Only one character per character-constant: this 

error indicates that an attempt was made to set the 
value of a constant to a character string longer than 
one character. The value of a constant can be an 
integer or a single character. If the value of a 
constant is set to a single character the ASCII 
equivalent of that character becomes the value of the 
constant . 

In addition to the above syntax error statements there 
are two syntax warning statements. These statements indicate 
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that there may be a syntax error and caution should be 
exercised during the simulation. These warning statements 
are: 

1. This process has undefined state: the interpreter 

has encountered a process in the functional environment 
whose state (value) is undefined. 

2. Antecedent of if is undefined: the interpreter has 

encountered a register, port, signal, process, or flag 
in the functional environment being used as the 
antecedent of an if statement and whose value is 
undefined . 

The above two warning statements are common for pipeline 
design architectures. Initially the value of the ports, 
registers, processes, signals, and flags of each stage of 
the pipeline are undefined and will stay undefined until 
data is clocked into and out of each stage. 

The MacPitts interpreter also displays error statements 
in the command line of the terminal screen if an interpreter 
command has been executed improperly by the programmer. 

The interpreter command error statements are: 

1. File not found: .int or . env file cannot be found. 

2. Cannot set this thing to value: only registers and 
ports can be set to value. 

3. Cannot set this thing to t , f: only signals or flags 

can be set to t or f. 
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4. Cannot set this thing to undefined: processes 

cannot be set to undefined. 

5. Cannot set this thing to tri-state: only input or 

i/o ports and signals can be set to tri-state. 

6. Invalid command type ? for help: check interpreter 

command list for correct command. 

7. Cannot input from this port (signal): check for 

input or i/o port (signal). 

8. Cannot output to this port (signal): check for 

output or i/o port (signal). 
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